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Abstract:  This  paper  gives  max  characterizations  for  the  sum  of  the  largest  eigen- 
values of  a  symmetric  matrix.  The  elements  which  achieve  the  maximum  provide  a 
concise  characterization  of  the  generalized  gradient  of  the  eigenvalue  sum  in  terms 
of  a  dual  matrix.  The  dual  matrix  provides  the  information  required  to  either  verify 
first-order  optimality  conditions  at  a  point  or  to  generate  a  descent  direction  for 
the  eigenvalue  sum  from  that  point,  splitting  a  multiple  eigenvalue  if  necessary.  A 
model  minimization  algorithm  is  outlined,  and  connections  with  the  classical  lit- 
erature on  sums  of  eigenvalues  are  explained.  Sums  of  the  largest  eigenvalues  in 
absolute  value  are  also  addressed. 

Key  words:  symmetric  matrix,  maximum  eigenvalues,  spectral  radius,  minimax 
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1      Introduction 

Let  ^  be  an  n  by  n  real  symmetric  matrix,  and  let  k  G  {l,...,n}.  Denote  the 
eigenvalues  of  yi  by  Ai, . . . ,  A„  and  also  by  //i, . . . ,  /x„,  the  difference  being  that  the 
former  axe  ordered  by 

Ai>--->A„,  (1.1) 

while  the  latter  are  ordered  by 

l/^l|>--->l/in|.  (1.2) 

Define 

UA)   =   J2\,  (1.3) 

«=i 

and 

gM)    -    X^l/^.l-  (1-4) 

1=1 

Note  that  fi(A)  is  the  largest  eigenvalue  of  A,  while  gi{A)  is  the  spectral  radius  of 
A  (its  largest  eigenvalue  in  absolute  value). 

After  establishing  some  further  notation  in  Section  2,  the  main  results  of  the 
paper  are  given  in  Sections  3  and  4  for  the  functions  fniA)  and  g^iA)  respectively. 
Max  characterizations  of  these  functions  are  established  in  terms  of  the  Frobenius 
inner  product  {A,B)  =  tr(^B^)  for  A,B  e  R"***"  (where  tr{A)  denotes  the  trace 
of  A),  and  sets  of  matrices  defined  by  positive  semi-definite  inequalities  (see  Section 
2).  Let  <S„  denote  the  set  of  real  n  by  n  symmetric  matrices, 


*„,.    =    {U  eSn:0<U  <I,  tviU)   =   K  }  (1.5) 


and 


*n,«     =     {W  eSn:W  =  U-V,   where  U,V  e  5„, 

0<U  <I,  0<V  <I,  tT{U)  +  tTiV)  =  k].  (1.6) 

The  sets  $„,«  and  ^„,k  are  compact  convex  subsets  of  5„.  It  is  shown  that 


and 


UA)    =     max   {A,U)  (1.7) 

f  6*n,K 


gM)   =     max    {A,W).  (1.8) 


In  fact,  equation  (1.7)  implies  a  well  known  result  of  Fan  (1949),  namely 

fJA)  =         max         triZ'^AZ).  (1.9) 


Fan's  result  is  widely  referenced;  see  Wielandt  (1955),  CuUum,  Donath  and  Wolfe 
(1975),  Friedland  (1981),  Sameh  and  Wisniewski  (1982),  Horn  and  Johnson  (1985) 
and  Fletcher  (1985).  Both  (1.7)  and  (1.9)  show  that  /«(>!)  is  a  convex  function. 
The  advantage  of  (1.7)  over  (1.9)  is  that  it  leads  directly  to  a  characterization  of  the 
subdifferential  of  /«  which  is  computationally  very  useful,  since  it  does  not  involve  a 
convex  hull  operation.  Equation  (1.8)  has  a  similar  advantage  over  a  corresponding 
analogue  of  (1.9). 

In  the  case  k  =  1,  (1.9)  reduces  to  the  Rayleigh  principle 


while  (1.7)  reduces  to 


fi{A)  =  max  q^A{x)q  (1.10) 

MA)=       max      {A,U).  (1.11) 

Note  that  the  inequality  U  <  I  is  not  required.  Equation  (1.11)  is  moderately  well 
known;  see  Fletcher  (1985)  and  Overton  (1988,  1990). 
Now  consider  the  composite  functions 

Mx)  =  /k(^(x)),  g^x)  =  g.{A{x))  (1.12) 

where  ^1(3:)  is  a  smooth  symmetric  matrix  function  defined  on  a  vector  of  param- 
eters X  G  R'".  The  use  of  the  same  symbol  /«  for  a  function  defined  on  the  set 
of  symmetric  matrices  and  on  the  parameter  space  R"  is  convenient,  and  the  dis- 
tinction should  be  clear  from  the  context.  The  max  characterizations  (1.7)  and 
(1.8)  prove  that  /^(x)  and  g^i^)  are  locally  Lipschitz,  subdifFerentially  regular,  and 
have  generalized  gradients  dfn{x)  and  dg^ix)  respectively,  which  are  nonempty 
compact  convex  sets  in  R'"  (see  Clarke  (1983)).  These  generalized  gradients  are 
obtained  by  composing  the  subdifferential  of  /«(>!)  and  g^iA)  with  the  derivative 
of  A{x),  using  the  chain  rule.  In  fact  Clarke  (1983,  Proposition  2.8.8)  derives  an 
expression  for  the  generalized  gradient  of  the  largest  eigenvalue  fi{x),  but  in  a 
form  which  requires  a  convex  hull  operation.  An  important  feature  of  our  max 
characterizations  is  that  they  lead  to  first-order  optimality  conditions  which  are 
computationally  verifiable,  providing  matrix  analogues  of  Lagrange  multipliers  in 
constrained  optimization,  namely  U  or  U  and  V,  which  we  call  dual  matrices.  The 
necessary  condition  0  G  df^{x)  or  0  G  dg^ix)  (see  Clarke  (1983))  provides  systems 
of  linear  equations  which  can  be  solved  to  obtain  the  dual  matrices.  Inequalities  of 
the  form  0  <  U  <  I  determine  if  the  current  point  is  a  stationary  point,  or  provide 
information  from  which  a  descent  direction  can  be  calculated. 

Equations  (1.7)  and  (1.8)  show  that  i{  A{x)  is  an  aflRne  function  then  f^ix)  and 
g^ix)  are  convex  functions.  They  also  illustrate  that  minimizing  /k(x)  or  gK{x)  are 
minimax  problems: 

mm  fJx)    =    min     max   {Aix).U),  (1-13) 


and 


min  gJx)    =    min      max    (A(x),W).  (1.14) 


If  A(a:)  is  affine,  the  saddle  point  result 

min      max   (A(x),U)    =     max      min  (A(x),U), 

and  a  similar  result  for  (1.14)  are  established.  This  is  similar  to  the  result  of  Shapiro 
(1985)  for  minimizing  a  function  of  a  symmetric  matrix  subject  to  positive  semi- 
definite  constraints.  These  saddle  point  results  justify  the  dual  matrix  terminology. 
If  the  eigenvalues  of  A(x)  are  distinct  then  (1.13)  and  (1.14)  are  just  minimeix 
problems  with  smooth  functions  Xi{x)  :  R"*  -^  R  for  i  =  l,...,n.  Let  A(x)  = 
(  Ai(x), . . .  ,  A„(x)  )^,  using  any  ordering  for  the  eigenvalues.  Then 

/kIx)  =  max   \{x)^u  (1.15) 

where 

n 

<^„,«  =  {u  G  R"  :0  <  u,  <  l,i  =  l,...,77,      ^u,  =  k},  (1.16) 

1=1 

and 

g^ix)  =  max    X{x)^w  (1-17) 

where 

n 

r/'„,«  =  {u'GR":  -1  <w.  <l,i  =  l,...,n,      ^|u),|  =  k].  (1.18) 

:  =  1 

The  additional  complication  in  (1.13)  and  (1.14)  arises  from  the  possibility  of  mul- 
tiple eigenvalues;  hence  the  positive  semi-definite  constraints  defining  the  sets  $„  ,5 
and  *„,«. 

This  paper  is  not  concerned  with  algorithm  development.  However,  a  brief 
discussion  of  model  algorithms  for  minimizing  /^(a;)  and  ^^(a:)  is  given.  These  are 
generalizations  of  the  algorithm  presented  by  Overton  (1988)  for  the  case  when 
«;  =  1  and  A(x)  is  affine.  It  is,  in  fact,  possible  to  design  the  model  algorithms 
so  that  they  have  quadratic  local  convergence,  even  if  the  objective  function  is 
not  smooth  at  the  solution;  see  Overton  (1988)  and  Overton  (1990).  More  detail 
will  be  given  by  Overton  and  Womersley  (to  appear).  CuUum,  Donath  and  Wolfe 
(1975)  gave  an  algorithm,  related  to  the  c-subgradient  methods  of  Lemarechal  and 
others,  for  the  case  that  the  variables  x  are  the  diagonal  elements  of  A{x).  The 
most  important  difference  between  this  earlier  work  and  our  model  algorithms  is 
that  the  latter  compute  the  dual  matrices  which  demonstrate  optimality.  These 
matrices  are  also  the  key  to  sensitivity  analysis  of  the  solution;  see  Overton  (1990, 
Section  3). 


Slims  of  eigenvalues  of  symmetric  matrices  have  been  addressed  in  one  form  or 
another  in  many  classical  papers  on  matrix  theory;  a  good  overview  is  Bellman 
(1970,  Chapter  S).  However,  the  rich  interconnection  between  this  subject  and  the 
sets  $„,K  and  '^n,^  appears  to  have  been  largely  overlooked.  Note  that  although 
all  our  results  are  given  in  terms  of  real  symmetric  matrices,  generalization  to  the 
complex  Hermitian  case  is  straightforward. 

The  classical  literature  on  sums  of  eigenvalues  does  not  include  much  discussion 
of  applications.  However,  these  appear  to  be  quite  numerous,  especially  in  con- 
nection with  adjacency  matrices  of  graphs.  See  in  particular  Cullum,  Donath  and 
Wolfe  (1975),  as  well  as  Rendl  and  Wolkowicz  (1990),  and  Alizadeh  (1991).  An- 
other application  is  the  "orthogonal  Procrustes"  problem,  which  refers  to  rotating 
a  number  of  matrices  towards  a  best  least  squares  fit.  This  problem  is  discussed  by 
Shapiro  and  Botha  (1988)  and  has  also  been  addressed  by  Watson  (1990).  There 
is  a  large  variety  of  applications  in  the  case  k  =  1;  see  Overton(1990). 

2      Notation 

The  following  notation  is  used  throughout  this  paper.  Let 

1.  »S„  =  set  of  all  n  by  n  real  symmetric  matrices  {A^  =  A). 

2.  ^„  =  set  of  all  n  by  n  real  skew-symmetric  matrices  {A    =  —A). 

3.  'Dn  =  set  of  all  n  by  n  real  diagonal  matrices. 

4.  Om,n  =  set  of  all  m  by  n  real  orthogonal  matrices,  where  m  >  n. 
Thus  Z^Z  =  In  for  all  Z  G  Om,n- 

The  vector  e,  is  the  zth  coordinate  vector,  e  is  the  vector  of  all  Is,  and  /  is 
the  identity  matrix,  with  the  dimensions  of  e,,  e  and  /  determined  by  the  context. 
A  matrix  D  E  T>n  is  denoted  by  diag(oi, . . .  ,a„),  or  diag(u)  where  u  G  R".  The 
convex  hull  of  a  set  fi  is  denoted  by  conv  fi. 

For  A  G  R"''"  the  eigenvalues  of  A  are  denoted  by  (1.1)  and  by  (1.2).  The  trace 
of  A  is 

n  n  n 

tr(A)  =  X]  a„  =  II  A,  =  ^  ^,. 
1=1  t=i  «=i 

The  positive  semi-definite  partial  ordering  on  »S„  is  used  to  express  matrix  in- 
equalities (see  Golub  and  van  Loan  (1985)  or  Horn  and  Johnson  (1985)  for  example). 
Thus  A  >  0  means  that  A  is  positive  semi-definite  (equivalently  y  Ay  >  0  Vy,  or 
A,  >  0  for  2  =  1,...  ,n).  Hence  A  >  B  means  that  A  —  B  is  positive  semi-definite. 
For  example  the  constraints  0<A<I  on  A^Sn  mean  that  0  <  A,  <  1  for 
i  =  1 , . . . ,  n . 


The  Frobenius  inner  product  {A,B)  of  two  matrices  A,Be  R""^"  is 

m      n 

{A.B)    =   tr(AB^)   =   EE«.>.> 

1=1  j=i 

This  inner  product  is  the  natural  extension  for  reed  matrix  variables  of  the  stEin- 
dard  inner  product  x^y  —  X!"=i  2r,j/,  on  R".  It  is  widely  used  in  problems  with  ma- 
trix variables,  for  example  in  the  work  of  Bellman  and  Fan  (1963),  Arnold  (1971), 
Craven  and  Mond  (1981).  Fletcher  (1985),  Overton  (1988),  and  Overton  and  Wom- 
ersley  (1988)  used  the  notation  A  :  B  for  {A,B). 

Some  useful  properties  of  the  Frobenius  inner  product  are  summarized  below. 

1.  {A,A)    =    \\A\\l. 

•2.   {A.I)    =   tv(A). 

3.  If  A  e  5„  and  A'  G  ICn  then  {A,K)    =   0. 

4.  As  tr(A5^)  =  tv{BA^)  =  tT{A^ B) 

{A,B)    =    {B,A)    =    {A^,B^)    =    (S^A^). 

In  particular 

(a)  For  any  nonsingular  matrices  5  €  R'"'""  and  T  G  R"''" 

{A,B)   =    {S-'A,S^B)    =    {AT-\BT^). 

(b)  If  A  E  R"''".5  e  R""*"*  and  Z  e  R"**"*  then 

{Z^AZ,B)   =   {A,ZBZ^). 

3      Sum  of  the  largest  eigenvalues 

Let  A  G  5„  have  eigenvalues 

Ai  >•••  >  A„, 

and  let  Q  G  C„,„  be  a  matrix  whose  columns  are  normalized  eigenvectors  of  .4,  so 

Q'^AQ   =   A  (3.1) 

where  A  =  diag(Ai, . . .  ,  A„).  The  matrix  Q  will  be  regarded  as  fixed,  although  if  A 
has  multiple  eigenvalues  the  choice  of  Q  is  not  unique. 


Let  K  G  {l,...,n}.  Section  3.1  establishes  a  max  characterization  of 

UA)    =   X:A.-  (3.2) 

1=1 

Thereafter  we  concentrate  on  the  case  when  A  :  R"*  — >  Sn  is  a  smooth  matrix- valued 
function  and 

Mx)    =   fMi^))-  (3.3) 

Section  3.2  considers  the  differential  properties  of  /k(j'),  Section  3.3  gives  necessary 
conditions  for  a  local  minimizer,  Section  3.4  establishes  a  saddle  point  result,  Section 
3.5  gives  a  formula  for  the  directional  derivative,  Section  3.6  discusses  the  generation 
of  descent  directions  by  splitting  multiple  eigenvalues  and  Section  3.7  considers  a 
model  algorithm  for  minimizing  /^(x). 

3.1      A  max  characterization 

Let 

$n,«    =    {U  eSn:0<U  <I,      tv{U)  =  K},  (3.4) 

and 

<t>n,K  =  {  u  G  R"  :    0  <  u  <  e,  e^u  =  K  }.  (3.5) 

Lemma  3.1  The  sets  ^n,K  a^<^  <f>n,K  fl'"^  compact  and  convex.  Moreover  ^^.k  ^^  *ti- 
variant  under  orthogonal  similarity  transformations  (i.e.  U  G  $„,«  '^=^  Z  UZ  € 
$ri,K   where  Z  G  On.n),  o,nd  ^n,K  o.'f^d  4>n,K  o^e  related  by 

*n.«     =     {U  eSn:U  =  ZDZ'^  where  Z  G  0„,„, 

D  =  diag(ui,. . .  ,u„)  and  u  G  ^„,k  },  (3.6) 

and 

(f>n.^     =     {  ?/  G  R"  :  «,  =  U„  for  i  =  l,...,n  where  U  G  $„.«  }.         (3.7) 

Proof:  That  $„,«  and  (f>n_K  are  compact  convex  sets  is  immediate.  A  spectral 
decomposition  of  U  yields  (3.6)  and  the  orthogonal  invariance  of  $„,«.  The  set  (f)n,K 
is  contained  in  the  right-hand  side  of  (3.7)  since  for  any  u  G  <i>n,K,  diag(ui, . . . ,  t/„)  G 
$„,«.  To  obtain  the  reverse  inclusion,  let  U  G  $n,K  and  define  u  by  u,  =  U„.  The 
facts  that  the  trace  is  the  sum  of  the  diagonal  elements  and  that  a  positive  semi- 
definite  matrix  cannot  have  a  negative  diagonal  element  then  show  that  u  G  <i>n.K- 
I 


To  characterize  the  elements  that  achieve  the  maximum  in  the  following  results, 
information  about  the  multiplicity  of  the  eigenvalues  of  A  is  needed.  Let 


Ai  >  •••  >  A,  > 

Ar  +  l    =   •  •  •   =  Ak   = 
Ar  +  (  +  l    ^   ■  •  •   ^   A„, 


Ar  +  (    > 


(3.8) 


where  t  >  1  and  r  >  0  are  integers.  The  multiplicity  of  the  /cth  eigenvalue  is  t.  The 
number  of  eigenvalues  Izirger  than  A«  is  r.  Here  r  may  be  zero;  in  particular  this 
must  be  the  case  if  k  =  1.  Note  that  by  definition 

r  +  l<K<r  +  t<n, 

so  t  >  K  —  r.  Also,  t  =  1  implies  that  k  =  r  +  1. 

First  two  lemmas,  which  depend  only  on  the  definitions  of  the  sets  $„ ,«  and  <f>n,K 
and  the  ordering  of  the  elements  A^  in  (3.8),  are  established.  In  particular  they  do 
not  require  A,  to  be  an  eigenvalue  of  a  matrix. 

Lemma  3.2  If  the  elements  of  \  ^  R"  satisfy  (S.8)  then 

K 

max   X  u  =  y^  A, 


«=1 


with 

argmax  {A   u  :  u  G  i^n,«  }  =  {  ^  G  R" 


u.  =  l 

i  =  l,...,r. 

0  <  u.  <  1 

z  =  r  +  l,...,r  +  <. 

u.  =  0 

i  —  r  -\-i  +  \,. . .  ,n  and 

-r}. 

Proof:  These  results  follow  directly  from  (3.5)  and  (3.8).  I 

Lemma  3.3   Let  A  =  diag(A)  where  the  elements  of  X  ^  R"  satisfy  (3.8).    Then 

K 

max   {A,U)  =  TX,  (3.9) 


argmax     {  (  A,  (7  )  :  t/  G  $„,«  } 

/ 

{ t/  e  5„  :  c/  = 


t/ 


,^e$,..-r} 


(3.10) 


where 

^t..-r  =  {U  e  St:  0<U  <  I  and  tT{U)  =  K-r}.  (3.11) 

Here  the  diagonal  blocks  of  U  have  dimension  r,  t,  and  n  —  r  —  t  respectively. 

Proof:  A  =  diag(A)  so 

{X,U)  =  J2KUu.  (3.12) 

1=1 

Hence  (3.9)  follows  from  combining  Lemmas  3.1  and  3.2. 

If  U'  is  any  element  of  the  right  hand  side  of  (3.10)  then  from  (3.S) 


(A,r-)    =    ^A.  +  A,tr(^^)    =    XI  A.. 


1=1 


1=1 


Now  suppose  V  e  argmax  {  ( A,  U )  :  U  £  $„,k  }■  Then  u'  =  (L7i, 
satisfies  A^u*  =  IZr=i  -^x  ^^  from  Lemma  3.2  it  follows  that 

W  =  1  z  =  1  r 

0<i:^*<l,  i  =  r  +  l,...,r  +  t 

U',^0,  i  =  r  +  t  +  l,...,n 

r  +  t 
t  =  r  +  l 


(3.13) 


U:r.Ve4>n,. 


(3.14) 


Partition  the  rows  and  columns  of  U'  into  blocks  of  r,  t  and  n  —  r  —  i  elements  so 

(3.15) 


U' 


Cii      Ci2      Ci3 

Ci2    C22    C 


23 


^13      ^23       ^33   J 


where  Cn  6  5^,  C22  G  St,  C33  e  5„_r-t.  As  U'  €  $„,«, 

3 
0  <  C„  <  /,      2  =  1,2,3       and        ^  tr(C„ )  =  k. 

t=i 

From  (3.14)  and  (3.15)  the  diagonal  elements  of  C\\  are  all  1  and  the  diagonad 
elements  of  C33  are  all  zero.  Suppose  an  off-diagonal  element  of  C\\  is  nonzero. 
Then  /  —  C\\  is  symmetric,  has  zero  elements  on  the  diagonal  and  a  nonzero  off- 
diagonal  element,  so  is  indefinite.  This  contradicts  C\\  <  /,  so  Cn  =  /.  Similarly 


I- 


C\\       C\2 

C12    c. 


22 


0  -Cn 


22 


>  0 


implies  that  C\2  =  0.  As  the  diagonal  elements  of  C33  are  eJI  zero  the  existence  of  a 
nonzero  off-diagonal  element  in  C33  would  contradict  C33  >  0,  so  C33  =  0.  Similarly 


it  follows  that  C13  =  0  and  C23  =  0  as  the  existence  of  a  nonzero  element  in  either 
of  these  submatrices  contradicts  U*  >  0.  The  only  conditions  remaining  then  are 
0  <  C22  <  I  with  tr(C22)  =  K  —  r  as  required.  I 

Remark:  The  proof  uses  the  basic  results  that  if  C  G  «S„  then 

C  >  0,      tr(C)  =  0    =>    C  =  0,  (3.16) 

C<I,      tr(C)  =  n     =>     C  =  /,  (3.17) 

which  follow  from  the  observation  that  if  C  >  0  and  C„  =  0  then  C,j  =  0  and 
Cj,  =  0  for  j  =  1,...  ,n. 

Now  consider  sums  of  the  eigenvalues  of  A.  Let  Pi  G  On,r  be  the  matrix  con- 
sisting of  the  first  r  columns  of  Q  (defined  in  equation  (3.1)),  and  let  Qi  G  On,t  be 
the  matrix  consisting  of  the  next  t  columns  of  Q.  By  (3.8)  we  have 

Pi^APi  =diag(Ai,...,  A,);         QjAQr=\J.  (3.18) 

Theorem  3.4  Let  A  G  Sn  have  eigenvalues  A]  >  —  •  >  A„.    Then 

K 

max   {A,U)    =   Y  \,.  (3.19) 

1=1 

//  the  eigenvalues  satisfy  (3.8)  then 

argmax     {  ( .4,  L^  )  :  t^  G  $„,«} 

{U  eSn-.U  =  P^P'^  -VQ.ilQl;    t7G$t,«-.},  (3.20) 

where  P\,  Q\  satisfy  (3.18). 

Proof:  For  any  U  G  $„,«  equation  (3.1)  and  the  properties  of  the  Frobenius  inner 
product  imply  that 

{A,U)   =   {QKQ'^^U)   =   {A^Q'^UQ),  .     (3.21) 

with  Q^UQ  G  $71, K  as  ^n.K  is  invariant  to  orthogoncil  transformations.  Hence 

K 

max   (A,U)=   max   {A,U)  —  YX,, 

1=1 

where  the  second  equality  follows  from  Lemma  3.3. 

From  (3.21)  and  the  invariance  of  $„,k  to  orthogonal  transformations 

argmax  {{A,U):U  G  $„.«}  =  {  QU'Q^  :  L/'  G  fi  } 
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where  ft  =  argmax  {{\,U)  ■  U  G  $„,«  }•    The  proof  is  completed  by  applying 
Lemma  3.3  since 


QU'Q'^  =  Q 


I 
U 
0 


Q''  =  PxPl  +  QiUQj. 


Remark:  The  term  dual  matrix  is  used  to  refer  to  either  U  E  $„,«  or  {7  G  ^(,/c-r- 
The  distinction  between  U  and  U  is  analogous  to  the  question  of  whether  or  not  to 
assign  zero  Lagrange  multipliers  to  inactive  constraints  in  nonlinear  programming. 


Remark:  U  k  —  r  +  t  then  U  =  I  and 

argmax  {( A,  i7):l/e$„,«}    =    P,PI  +  Q,QJ.  (3.22) 

The  matrix  achieving  the  maximum  is  unique  if  and  only  i{  k  =  r  +  i.  In  particular 
this  is  the  case  if  i  =  1,  i.e.  the  Kth  largest  eigenvalue  is  simple. 

Remark:  All  the  freedom  in  the  choice  of  Qi  in  (3.18)  is  absorbed  into  the  matrix 
U .  It  also  makes  no  difference  if  any  of  the  eigenvalues  Ai, . . .  ,  A^  have  multiplicity 
greater  than  1,  as  all  of  these  multiple  eigenvalues  are  included  in  /«(>!).  The 
corresponding  columns  of  Pj  can  be  any  orthonormal  basis  for  the  corresponding 
eigenspace,  without  affecting  PiP^. 

Remark:  As  yl  €  5„  it  follows  that  (.4, A')  =  0  for  any  K  G  fCn  (the  set  of 
skew-symmetric  n  by  n  matrices).  Hence 

MA)   =   {A,U  +  K) 

for  any  U  belonging  to  (3.20)  and  K  G  AC„. 

Corollary  3.5  The  function  f^iA)  :  5„  — >  R  is  convex  and  Us  subdifferential 
df^^A)  is  the  nonempty  corn-pact  convex  set 

dUA)  =  {U  eS^:3Ue  $,,«-r  with  U  =  PiP[  +   QiUQj  }.  (3.23) 


Proof:  For  any  U  G  $n,«  the  inner  product  {A,U  )  is  a  hnear  function  of  >1  G  5„, 
so  the  convexity  of  /k(A)  follows  from  the  max  characterization  in  Theorem  3.4. 
Moreover,  from  Rockafellar  (1970,  corollary  23.5.3),  the  subdifferential  of  a  function 
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defined  as  a  pointwise  maximum  of  a  set  of  linear  functions  is  the  convex  hull  of  the 
gradients  of  the  linear  functions  achieving  the  maximum  at  the  given  point.  Thus 
the  subdifFerential  of  /k(.4)  is 

dfM)    =   conv{UeS.:{A,U)  =  MA)]. 

The  result  follows  from  Theorem  3.4  as  the  set  (3.20)  is  already  convex.  I 

Remark:    As  previously  noted  any  skew-symmetric  matrix  can  be  added  to  U 

without  affecting  the  inner  product  {A^U).  As  /k(-4)  is  defined  on  »S„  only  the 
symmetric  subdifferential  is  of  interest. 

Remark:  In  the  case  k  =  1,  the  condition  U  <  I  is  unnecessary  since  it  is  implied 
by  U  >  0,tr(f/)  =  1.  This  case  is  moderately  well  known;  see  Fletcher  (1985). 
Fletcher  also  addressed  the  case  k  >  1  in  an  appendix,  but  his  Theorem  A. 4  is 
incorrect  in  the  case  k  >  1  since  the  condition  U  <  I  was  omitted. 

This  section  is  concluded  by  relating  our  max  characterization  to  the  well  known 
result  of  Fan  (1949). 

Lemma  3.6    The  extreme  points  of  <f)n,K  ore 

(  f  1     for  exactly  k  of  the  indicies  1, . . .  ,n;    ->  ,r,  rtA\ 

{u  :  u,  =  \        \  .     ^  ].  (3.24) 

I  0      otherwise. 


Proof:  Straightforward.  I 

Lemma  3.7  The  extreme  ■points  o/$„,;  are  the  elements  in  $„,«  with  rank  k,  that 
is  the  set  of  matrices  U  €  $„,«  with  k  eigenvalues  equal  to  1,  and  n  —  k  eigenvalues 
equal  to  zero. 

Proof:  Matrices  in  $„ ,«  must  have  rank  at  least  k.  Since  $„ ,«  and  </!>„, «  are  related 
by  (3.6),  it  is  straightforward  to  show  that  any  element  of  $„.«  with  rank  greater 
than  K  is  not  an  extreme  point.  The  only  candidates  for  extreme  points,  then,  are 
those  with  rank  k.  But  it  is  clearly  not  possible  that  some  rank  k  elements  are 
extreme  points  and  others  not,  since  the  definition  of  $„ ,«  does  not  in  any  way 
distinguish  between  different  rank  k  elements.  Since  a  compact  convex  set  must 
have  extreme  points,  the  proof  is  complete.  I 

Fan's  theorem  now  follows  as  a  corollary. 


Theorem  3.8  (Fan) 


max  tr(Z^.4Z)  =  ^A,.  (3.25) 


12 


Proof:  Since  a  linear  function  on  a  convex  set  must  assume  its  maximum  at  an 
extreme  point,  combining  Theorem  3.4  with  the  lemma  just  given  shows  that 

max         {A,U)  =  i:X,.  (3.26) 

t'e<l>„,K,rankt'=«  ~^ 

Such  matrices  U  have  precisely  the  form  ZZ^,  where  Z  G  On,K-  The  proof  is 
completed  by  noting  that 

{ A,  ZZ^  )  =  ( Z'^AZ,!)  =  triZ'^AZ).  (3.27) 

I 

Remark:  Another  expression  for  the  sub  differential  is 

df^iA)  =  conv  {  ZZ^     :      columns  of  Z  form  an  orthonormal  set 

of  K  eigenvectors  for  Ai, . . . ,  A„  }.  (3.28) 

Note  that  these  are  just  the  elements  that  achieve  the  maximum  in  Fan's  theorem, 
and  that  the  elements  whose  convex  hull  is  being  taken  are  the  extreme  points  of 
(3.23).  Although  simpler  to  write  than  (3.23)  this  expression  is  not  as  useful,  as 
the  structure  of  the  subdifferential  is  not  apparent. 

Remark:  H.  Woerdeman  and  C.-K.  Li  recently  informed  us  that  the  equality 

conv{ZZ^:    Z  G  On,.}    =    $n,«. 

appeared  in  Fillmore  and  Williams  (1971).  This  implies  the  equivalence  of  (3.19) 
and  (3.25),  although  it  does  not  by  itself  imply  either  of  them;  nor,  strictly  speaking, 
does  it  imply  Lemma  3.7. 

3.2      The  generalized  gradient 

Let  A{x)  :  R'"  — >  «S„  be  a  smooth  (at  least  once  continuously  differentiable)  function 
whose  partial  derivative  with  respect  to  Xk  is 

Mil  f     I     1 

Ak(x)    =z    -^ for  fc  =  l,...,m. 

dxk 

This  section  is  concerned  with  finding  a  computationally  useful  characterization  of 
the  generalized  gradient  of  the  function 

Although  the  eigenvalues  and  eigenvectors  of  A{x)  are  functions  oi  x  E  R"",  the 
explicit  dependence  on  x  will  usually  be  omitted.  Thus  the  eigenvalues  of  ^(x)  are 
denoted  by  as  before  (3.8),  with  r  and  t  now  dependent  on  x,  and  with  corresponding 
eigenvectors  satisfying  (3.18). 
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Theorem  3.9   The  function  fK{x)  is  locally  Lipschiiz,  subdifferentially  regular,  and 
its  generalized  gradient  is  the  nonempty  compact  convex  set 

dMx)     =     {  u  6  R""  :  3  r  e  5t  with  0  <  T  <  7.    tr{L')  =  k  -  r.    and 

u,  =  tT{P^Ak{x)P^)   +   {QlAk{x)Qr,U),  A-  =  l....,m}.(3.29) 


Proof:  Since  /k(.4)  is  convex  and  A(x)  is  smooth  the  chain  rule  (Clarke  (1983), 
Theorem  2.3.10)  impHes  that  /^(i)  is  locally  Lipschitz.  subdifferentially  regular, 
zind  that 

dUx)    =    {tiGR'"     :     Uk    =    (.4fc(x),t^),fc  =  l,...,m 

where     U  €  dMA{x))  }.  (3.30) 

Corollarj-  3.5  and  the  properties  of  the  inner  product  complete  the  proof.  I 

Remark:  Since  by  Theorem  3.4 

f^{x)    =     max   {A(x),U). 

the  result  also  follows  from  the  chEiracterization  of  generalized  gradients  of  functions 
defined  by  a  pointwise  maximmn  in  Clarke  (1983.  Theorem  2.8.6).  The  Clarke 
characterization  shows  that 

df^(x)    =    convliiGR"*     :    Uk   =    {Ak{x),U),k  =  l,...,m 

where     (-4(i),  ?7  )  = /«(x)  }.  (3.31) 

Equation  (3.29)  follows  from  Theorem  3.4  since  the  maximizing  set  is  already  con- 
vex. 

Remark:  Bellman  and  Fan  (1963)  gave  an  example  where  the  set 

{  u  G  R'"  :  Ufc    =    ( Ak.  U  ),    where  T  >  0  } 

is  not  closed,  cind  gave  sufiicient  conditions  for  it  to  be  closed.  This  is  not  a 
difficulty  in  our  case  because  the  trace  condition  ensiires  that  $„,«  is  compact. 
As  /«(x)  :  R"*  ^  R  is  a  locally  Lipschitz  function  its  generalized  gradient  is  a 
nonempty  compact  convex  set  in  R"*. 

Corollary  3,10  7/  A«  >  A«+i  (i.e.  k  =  r  +  t)  the  function  /«  is  differentiable  at  x 
vnth 

^^   =  iv{Pl  A,{x)P,)  +  iT{QlA,{x)Q,).  (3.32) 

OXk 
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Proof:  As  K  —  r  =  t  the  only  solution  to  tv(U)  =  t  and  0<U<I  is  U  =  I.  Hence 
the  set  df^ii)  is  a  singleton,  /^(x)  is  difFerentiable,  and  (3.29)  reduces  to  (3.32).  I 

Remark:  A  key  point  here  is  that  if  A_,  is  an  eigenvalue  of  A{x)  with  multiplicity 
i  there  exists  a  neighbourhood  of  x  in  which  the  corresponding  group  of  ^  eigen- 
values is  distinct  from  all  the  other  eigenvalues.  The  sum  of  all  the  eigenvalues  in 
this  group  is  a  differentable  function  in  this  neighbourhood  (see  Kato(1982)).  In 
particular,  simple  eigenvalues  axe  smooth  functions  of  x. 

Remark:  Cullum,  Donath  and  Wolfe  (1975)  studied  the  case  where  A{x)  is  affine 
and  only  the  diagonal  elements  o{  A{x)  vary,  so  that  m  =  n, 

A{x)  =  Ao  +  diag(a-i,...  ,x„), 

and  Ak{x)  =  CkcJ  for  A:  =  1, . . . ,  n.  Using  Fan's  theorem  they  showed  that  /«(a:)  is 
difFerentiable  at  x  when  A«  >  A«+i ,  and  that 

dUx)     =    conv  {  r  e  R"  :  i;^    =    tv{P[ Ak{x)Pi)   +   tr(Z^g[A,(a:)Q,Z) 

for  k  =  I, . . .  ,m  and  Z  G  On,t  }■  (3.33) 

The  relationship  between  (3.33)  and  (3.29)  is  precisely  that  already  explained  be- 
tween (3.2S)  and  (3.23),  namely  the  argument  of  the  convex  hull  in  (3.33)  is  the  set 
of  extreme  points  of  (3.29). 

3.3      Necessary  conditions 

The  standard  necessary  condition  for  x  to  be  a  local  minimizer  of  f^,  namely  0  G 
df{x)  (see  Clarke  (1983)).  implies  there  exists 

U  eavgmsix{{A{x),U)  :  l^  €  $„,«}  (3.34) 

such  that 

{At,  17)   =   0  for  k  =  l,...,m.  (3.35) 

From  Theorem  3.9  equations  (3.34)  and  (3.35)  are  equivalent  to  the  existence  of  a 
U  E  St  such  that 

0<U  <I,     tv{U)  =  K-r,  (3.36) 

and 

tr{PjAkP,)  +   {QjA.QuU)    =   0  for  A- =  1, . . .  ,77i.  (3.37) 

The  conditions  (3.36)  and  (3.37)  are  useful  computationally  as  one  can  relax 
the  inequalities  on  U  and  solve  (3.37)  together  with  tT{U)  =  k  —  r  for  U.  This 
requires  solving  a  system  of  m  +  1  linear  equations  for  the  t{t  +  l)/2  unknowns  in 
the  symmetric  matrix  U .  If  the  inequalities  0  <  L^  <  /  are  not  satisfied  then  a 
descent  direction  may  be  generated.  This  is  discussed  in  Section  3.6. 

If  /k  is  convex  (for  example  if  A{x)  is  affine)  then  equations  (3.36)  and  (3.37) 
are  both  necessary  and  sufficient  for  x  to  be  a  minimizer  of  /^ . 
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3.4      A  saddle  point  result 

Consider  the  function 

C{x,U)    =    {A{x),U).  (3.38) 

This  section  establishes  a  saddle  point  result,  based  on  well  known  results  for  La- 
grangian  functions  in  convex  programming  (Rockafellar  (1970)).  A  point  x*  G 
R"*,  U"  G  $n,K  is  a  saddle  point  of  £(ar,  U)  if  and  only  if 

C{x\U)<C{x\U')<C{x,U')     VxeR""     Vi7e$„,«.  (3.39) 

It  is  well  known  and  easy  to  show  that 

min      max    Cix.U)    >     max      min   Cix.U).  (3.40) 


The  primal  problem  is 

where 

Define  the  dual  problem  to  be 

where 


min  f,{x)  (3.41) 

xeR" 


/,(x)  =    max    C{x,U).  (3.42) 


max    hiU),  (3.43) 


h(U)=  min    C{x,U).  (3.44) 

xeR"" 

The  following  saddle  point  result  is  similar  to  that  of  Shapiro  (1985)  for  minimiz- 
ing a  function  of  a  symmetric  matrix  subject  to  positive  semi-definite  constraints. 
Indeed,  it  follows  from  the  general  saddle  point  theory  for  convex-concave  functions 
(Rockafellar(1970),  Theorem  36.3),  but  we  give  the  proof  for  completeness. 

Theorem  3.11  For  each  U  €  $„,«  let  C{.,U)  be  a  convex  function,  and  let  the 
primal  problem  attain  its  solution  at  x' .  Then  the  primal  and  dual  problems  have 
the  same  optimal  value  so 

min      max    C(x,U)  =   max      min   CixM),  (3.45) 

and  U'  satisfying  (3.34)  <^^<^  (S.S5)  solves  the  dual  problem.  (3.43). 

Proof:  From  (3.42)  /^(x)  is  a  convex  function  as  it  is  a  maximum  of  convex 
functions,  so  0  G  df^ix').   Hence  there  exists  a  U'  satisfying  equations  (3.34)  and 
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(3.35).  We  only  have  to  show  that  (x',U')  is  a  saddle  point  of  C{x,U).  From 
Theorem  3.4  and  (3.34) 

C{x\u')  =  Mxn>Cix\u)   V   t/e$„,«. 

The  function  £(.,L^*)  is  convex,  so  0  G  df^{x')  imphes  that  C{.,U')  attains  its 
minimum  at  x*.  This  estabhshes  (3.39).  I 

Remark:  As  A{x)  is  a  smooth  function  of  x,  so  is  £(.,?/)  for  any  U.  Hence  a 
necessary  condition  for  x'  to  be  a  local  minimizer  of  C{.,U)  is  that 

{Akix'),U)    =   0  for  fc  =  l,...,m.  (3.46) 

If  A{x)  is  an  affine  function  then  £(.,  U)  is  also  an  affine  (and  hence  convex)  func- 
tion. In  this  case  C{.,U)  does  not  have  a  finite  minimum  unless  it  is  constant,  so 
(3.46)  holds  for  all  x  €  R"". 


3.5      The  directional  derivative 

As  /„(x)  is  subdifferentially  regular  the  standard  one-sided  directional  derivative 
at  T  in  a  direction  d  G  R""  exists  and  satisfies 

/^(x;  d)     =      hm  

o-*o+  a 

=       max     u   d 

m  m 

=    Yl^k  tv{PlAkPi)+    max     ^  f/;.(  QfA.Q,,  t>).        (3.47) 

Recall  that  the  matrices  Pi  and  Qi,  defined  by  (3.18),  are  evaluated  at  the  point 
X,  and  that  Ak  is  the  partial  derivative  of  A{x)  with  respect  to  x^  evaluated  at  the 
point  X.  Define 

6,  =  tT{P[AkPi ),  Bk  =  QjAkQi   for  k  =  l,...,m  (3.48) 

and  define  B{d)  £  St  by 

m 

B{d)  =  Y,dkBk.  (3.49) 

k-l 

Note  that  b^d  is  the  sum  of  the  eigenvalues  (the  trace)  of 


Y^d.PjAkPi. 


k=\ 
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Let  the  eigenvalues  of  B(d)  be  /?i  >  •  •  ■  >  /?(.  Then  from  (3.47)  and  Theorem  3.4  it 
follows  that 

ax-d)  =  h'd  +  xJA-  (3.50) 

1=1 

Hence  f'^{x\d)  is  the  sum  of  all  r  eigenvalues  of  Yl'k=\  dk  P\  AkP\  plus  the  sum  of 
the  K  —  r  largest  eigenvalues  of  XZfcLi  ^k  Qj^kQi-  Note  that,  unless  k  —  r  +  t,  this 
is  generally  not  the  same  as  the  sum  of  the  k  largest  eigenvalues  of 


pr 


Ak[Pi  Qx]. 


Remark:  Classical  eigenvalue  perturbation  theory  (Kato(1982))  states  that,  in 
general,  as  A{x)  is  perturbed  by  aYldkAk  +  o{a),  the  multiple  eigenvalue  A^+i  = 
■  •  •  =  Ar+/  splits,  with  the  t  perturbed  eigenvalues  having  first-order  changes  given, 
respectively,  by  the  t  eigenvalues  oi  B{d).  This,  together  with  the  fact  that  the  sum 
of  the  first  r  eigenvalues,  being  separated  from  Ar+i,  is  a  smooth  function,  provides 
an  alternative  proof  of  (3.50).  However,  the  proof  of  these  classical  results  is  by  no 
means  straightforward. 

3.6      Splitting  multiple  eigenvalues 

Given  X,  it  is  desired  to  either  (a)  generate  a  descent  direction  for  /«,  or  (b)  demon- 
strate that  X  satisfies  the  first-order  condition  for  optimality.  If  k  =  r-|-/  then  /^(x) 
is  differentiable;  consequently  it  is  sufficient  to  examine  the  gradient,  whose  entries 
axe  given  by  (3.29).  If  the  gradient  is  zero,  the  first-order  optimality  conditions 
hold;  conversely  if  it  is  not  zero,  the  negative  gradient  provides  a  descent  direc- 
tion. We  therefore  consider  only  the  nonsmooth  case  k  <  r  +  t  in  the  remainder  of 
this  section,  although  the  following  results  actually  apply  in  general  (with  a  slight 
modification  in  the  second  case).  When  a  descent  direction  exists,  we  are  not  par- 
ticularly interested  in  obtaining  the  steepest  descent  direction,  since  it  is  generally 
advantageous  to  maintain  the  correct  multiplicity  when  possible  (by  analogy  with 
active  set  methods  for  constrained  optimization).  In  the  first  of  the  next  three  cases 
a  descent  direction  is  generated  keeping  A^  of  multiplicity  t  (to  first  order),  while 
in  the  second  case,  generation  of  a  descent  direction  requires  splitting  the  group  of 
eigenvalues  corresponding  to  A^. 

Case  1 .  /  G  Span{  Bi , . . . ,  Bm  }  ■ 

Solve  the  system 

61-Y.dkBk    =    0  (3.51) 

k=\ 

18 


(K-r)6  +  Y^d,h    =    -1.  (3.52) 

jt=i 


This  is  a  system  of  t{t  +  l)/2  +  1  linear  equations  in  m  +  1  unknowns  S^di, . . .  ,dm- 
Equation  (3.51)  implies  that  the  eigenvalues  of  B{d)  defined  by  (3.49)  are  all  equal 
to  S.  The  system  is  solvable  since  (3.51)  is  solvable  for  any  S  by  assumption,  and 
(3.52)  scales  this  solution.  Hence,  from  equations  (3.50)  and  (3.52),  f'^{x;d)  =  —1, 
where  the  direction  d  G  R"*  has  components  di, . . .  ,djn.  Note  that  the  —1  on  the 
right-hand  side  of  (3.52)  is  just  a  normalization  constant  and  can  be  replaced  by 
any  r/  <  0  giving  /'^{x;  d)  =  rj.  To  first  order  all  the  eigenvalues  Ar+i(x), . . .  ,  Xr+i{x) 
decrease  at  the  same  rate  along  d,  and  6  gives  a  first  order  estimate  of  the  change  in 
their  common  value.  Case  1  holds  generically  ii  m  >  t{t  +  l)/2,  that  is  the  generic 
dimension  of  the  manifold  defined  by 

Ar+i(x)  =  •  •  •  =  Xr+t(x)  (3.53) 

is  greater  than  zero  (see  Overton  and  Womersley  (1988),  Section  2  for  more  detail). 

Case  2.  Case  1  does  not  apply  and  the  span  of  the  m  +  1  vectors  in  R'('+')/2 
associated  with  I,Bi, . . . ,  Bm  has  the  maximum  dimension  t{t  +  l)/2. 

Solve  the  linear  system 

tr(C^)    =    K-r  (3.54) 

-{Bk,U)     =    h,      k  =  l,...,m  (3.55) 

for  the  dual  matrix  U  €  St.  Note  that  the  trace  condition  (3.54)  is  equivalent  to 
(/,[/)  =  K  —  r.  Since  the  {Bk}  may  not  form  a  linearly  independent  set,  (3.55) 
may  be  replaced  by  considering  only  a  maximal  independent  set  oi  {Bk};  the  system 
cannot  be  inconsistent  because  of  the  related  definitions  of  the  left  and  right-hand 
sides  {Bk  and  bk).  By  the  rank  assumption,  the  resulting  linear  system  is  square  and 
nonsingular,  with  order  t{t  -\-  l)/2,  and  having  a  unique  solution  U.  If  U  satisfies 
0  <  f/  <  /  then  0  G  5/k(x),  so  x  satisfies  the  first-order  necessary  conditions  for 
a  minimum.  If  these  inequalities  on  U  are  not  satisfied  then  the  following  result 
shows  how  to  generate  a  descent  direction. 

Theorem  3.12  Suppose  (S.54)  and  (3.55)  are  satisfied  but  0  ^  df^ix),  so  U  has  an 
eigenvalue  9  outside  [0,1].  Let  2:  G  R'  be  the  corresponding  normalized  eigenvector 
of  tj.    Choose  ^  eR  so  that  ^  <  0  if  9  >  1  and  /3  >  0  if  9  <  0.   Solve 

m 

81  -Y,dkBk  =  pzz^.  (3.56) 

k=\ 

Then  c?  =  [d]  •  ■  ■  ,  d^nY  ^•''  '^  descent  direction. 
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Proof:  The  linear  system  (3.56)  is  solvable  by  hypothesis,  although  d  is  unique 
only  if  the  {Bk}  are  independent.  Note  that  the  coefficient  matrix  of  the  left  hand 
side  (3.56)  is  the  transpose  of  that  for  the  system  (3.54),  (3.55).  Taking  the  inner 
product  of  (3.56)  with  U  gives 

m 

Stv{U)-J2MBk,U)=/3{zz'',U),  (3.57) 

k-l 

so  from  (3.54),  (3.55)  and  the  spectral  decomposition  of  U 

6{k  -r)  +  b^d  =  /30.  (3.58) 

From  (3.56)  B{d)  —  81  —  /3zz^  has  eigenvalues  6  —  (3,6, ...  ,6.  If  /?  <  0  the  sum  of 
the  K  —  r  largest  eigenvalues  of  B{d)  is  6{k  —  r)  —  (3.  Hence  (3.50)  and  (3.58)  give 

f:{x;d)=/3{e-l).  (3.59) 

Thus  if  ^  >  1  choosing  (3  <  0  and  solving  (3.56)  produces  a  descent  direction.  If 
/?  >  0  then,  since  k  <  r  +  t,  the  sum  of  the  k  —  r  largest  eigenvalues  of  B(d)  is 
6{k  -  r).  Then  (3.50)  and  (3.58)  give 

f^{x-d)  =  0e.  (3.60) 

Therefore,  if  ^  <  0,  choosing  /?  >  0  and  solving  (3.56)  produces  a  descent  direction. 
I 

Remark:  For  the  case  k  =  1  this  result  was  given  in  Overton  (1988)  and  Overton 
and  Womersley  (1988).  As  noted  earlier  the  condition  U  <  I  is  not  required  when 
K  =  1. 

Remark:  Progress  is  made  by  splitting  the  multiple  eigenvalue  while  maintaining 
multiplicity  t  —  \  io  first  order,  i.e.,  the  first-order  change  in  all  the  eigenvalues 
but  one  in  the  cluster  is  the  same.  This  is  analogous  to  moving  off  a  single  active 
nonlinear  constraint  in  the  context  of  constrained  optimization.  The  dual  matrix 
U  provides  the  information  which  leads  to  the  generation  of  a  descent  direction, 
just  as  negative  Lagrange  multipliers  provide  similar  information  in  constrained 
optimization.  The  distinction  between  the  cases  ^  <  0  and  ^  >  1  is  as  follows:  if 
^  <  0,  one  eigenvalue  in  the  group  of  multiplicity  t  is  separated  from  the  others 
by  a  reduction,  reducing  the  approximate  multiplicity  but  leaving  the  number  of 
eigenvalues  larger  than  A^,  to  first  order,  unchanged.  If  ^  >  1  one  eigenvalue  is 
separated  from  the  others  by  an  increase,  again  reducing  the  approximate  mul- 
tiplicity but  increasing  the  number  of  larger  eigenvalues  (to  first  order).  In  either 
case  the  theorem  guarantees  an  overall  reduction  in  /„. 
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Case  2  applies  generically  ii  m  +  1  =  t{t  +  l)/2,  i.e.  the  generic  dimension  of  the 
manifold  defined  by  (3.53)  consists  of  a  single  point.  It  also  applies  if  x  minimizes 
/k  on  this  manifold;  see  Overton  (1988)  for  a  further  explanation. 

Case  3.  Neither  of  Cases  1  and  2  apply.  In  this  case  degeneracy  is  said  to  occur. 
Generation  of  a  descent  direction  is  not  straightforward. 

3.7      Model  algorithms 

Practical  algorithms  for  minimizing  /^(x)  based  on  successive  linear  or  quadratic 
programming  may  be  defined  to  fully  exploit  the  structure  of  the  generalized  gra- 
dient of  /k-  Such  algorithms  have  been  described  and  tested  extensively  in  the  case 
K  =  1  by  Overton  (1988,  1990). 

Suppose  that  x*  is  a  (local)  minimizer  of  /k(x),  with  corresponding  values  r" 
and  f  defined  by  (3.8).  The  model  algorithm  must  use  estimates,  say  r  and  t,  of 
r*  and  t'.  Note  that,  in  general,  the  matrix  iterates  generated  by  the  algorithm 
will  have  eigenvalues  which  are  strictly  multiple  only  in  the  limit  at  a:  =  x*.  The 
simplest  way  to  estimate  r  and  t  is  to  use  an  eigenvalue  separation  tolerance.  The 
basic  step  of  a  model  algorithm  for  minimizing  /k(x)  then  becomes  the  solution  of 
the  following  quadratic  program: 


min  (K-r)S  +  b^d+l(fHd  (3.61) 

m 

Subject  to        61  -'Y^dkQlAkQx    =   diag(Ar+i, . . . ,  A^+O-  (3.62) 

Here  H  is  some  positive  semi-definite  matrix  and  all  the  quantities  Qi,j4fc,A,  and 
b  (defined  in  (3.48))  are  evaluated  at  the  current  point  x.  The  next  trial  point  is 
X  -\-  d^  and  8  gives  an  estimate  of  A„(x  -|-  d). 

Briefly,  equation  (3.62)  represents  the  appropriate  linearization  of  the  nonlinear 
system 

A,+i(x  +  J)  =  . . .  =  A,+,(x  +  d)  =  8;  (3.63) 

since  this  system  is  not  differentiable,  this  needs  justification  (Friedland,  Nocedal 
and  Overton(1987)).  The  first  two  terms  in  the  objective  function  (3.61)  represent 
a  linearization  of  f^{x  +  d),  while  the  third  term  may  be  used  to  incorporate  second 
derivative  information.  The  Lagrange  multipliers  corresponding  to  the  t{t  +  l)/2 
equality  constraints  (3.62)  make  up  a  dual  matrix  estimate  of  U . 

Inequalities  may  be  included  in  the  quadratic  program,  to  ensure  that  lineariza- 
tions of  the  first  r  eigenvalues  (or  at  least  their  average)  are  no  smaller  than  (5,  and 
that  the  linearizations  of  Ar+f+i, . . .,  are  no  larger  than  6.  A  trust  region  constraint 
may  be  used  to  ensure  reduction  of  the  objective  function  at  the  trial  point.   It  is 
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possible  to  choose  H  so  that  local  quadratic  convergence  takes  place;  alternatively, 
H  can  be  set  to  zero  for  a  first-order  method  using  a  linear  programming  subprob- 
lem.  Further  details  are  beyond  the  scope  of  this  paper,  but  see  Overton(1990)  for 
the  case  k  =  1. 

4      Sum  of  the  largest  eigenvalues  in  absolute  value 

We  are  now  interested  in  functions  of  the  form 

gM)   =   El/^.l  (4.1) 

1=1 

where  the  eigenvalues  fj.,  of  A  are  ordered  by 

\^ll\>■■■>\^^n\.  •  (4.2) 

Let  Q  =  \lJ'K.\,  the  Acth  largest  eigenvalue  modulus. 

One  approach  is  to  apply  the  results  of  the  previous  section  to  minimizing  the 
sum  of  the  k  largest  eigenvalues  of 

A{x)  0 

0        -A{x) 

However  not  only  is  the  size  of  the  problem  then  doubled,  but  the  structure  of  the 
subdifferential  is  lost. 

The  techniques  in  Section  4  are  related  to  the  well  known  idea  of  represent- 
ing a  scalar  a  by  its  positive  part  q^.  =  max{0,a}  and  its  negative  part  a_  = 
maxjO,  —a},  so  a  =  a+  —  a_  and  \a\  =  a+  +  a_.  For  w  €  R"  the  vectors  w+  and 
W-  are  defined  componentwise.  This  is  discussed  further  in  the  Appendix. 

4.1      A  max  characterization 

This  section  establishes  a  max  characterization  of  the  k  largest,  in  absolute  value, 
eigenvalues  of  a  matrix,  and  the  elements  which  achieve  the  ma.ximum  in  this  char- 
acterization. Let  K  6  {1, . . .  ,7z},  and  define 

*„,«    =     {  W  eSr^-.W  =  U  -V,   where  U,V  e  5„, 

0  <  t/  <  /,  0  <  F  <  /,  tT{U)  +  tr(F)  =  K  }  (4.3) 

and 

V'n.K     =     {  u'  G  R-"  :    w  =  u  —  V,    where  u,t;  G  R", 

0<u<e,  0<i'<e  for  i  =  1, . . . ,  n,    and  e  (u  +  v)  =  k  ].    (4.4) 
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Lemma  4.1  '^n.K  dnd  i/'n,K  '^'"^  compact  convex  sets,  ^„ ,«  is  invariant  under  or- 
thogonal similarity  transformations  (i.e.  W  G  ^n,K  ■^==^  Z  WZ  G  '!'„,«  for  any 
Z  G  On,n),  o-nd  ^„,„  an(i  V'n>«  '^'"'^  related  by 

where  Z,  1'  G  On,n,      -D  =  diag(u),      £"  =  diag(t;), 
0  <  u  <  e,  0  <  V  <  e,  i  =  1,. . .  ,ri  and  e   (u  +  r)  =  k  }        (4.5) 


an<2 


'/'«,«     =     {  u'  G  R"  :  «',  =  W^..  for  i  =  1, . . .  ,n  where  V^'  G  *„,«  }■       (4.6) 


Proof:  It  is  easily  verified  that  ^„,„  and  V'n,«  are  compact  convex  sets.  A  spectral 
decomposition  of  the  matrices  ?7,  V  G  «5„  in  the  definition  of  '^n,K  shows  that 

*n,K    =    {W  eSn-.      W  =  ZDZ^  -  YEY^ 
where  Z,  Y  G  On.„,      D,E  eVn, 
0  <  D  <  I,      0<  E  <  I  and  tr(Z))  +  tr(i;)  =  k  },  (4.7) 

which  establishes  (4.5),  and  the  invariance  of  ^„_«  to  orthogonal  similarity  transfor- 
mations. Let  1)  denote  the  set  on  the  right  hand  side  of  equation  (4.6).  Choosing 
Z  =  Y  =  I,  D  =  diag(u'+)  and  E  =  diag(u;_)  in  (4.7)  ,  for  any  w  G  t/j^.^,  shows 
that  0„ ,«  C  Q.  To  establish  the  reverse  inclusion  let  ti»  G  fi,  i.e.  Wi  =  PT,,,  where 
W  =  U  —  V  with  U,V  satisfying  the  conditions  in  (4.3).  Define  u  and  v  by  u,  =  Uu, 
V,  =  V„.  Then  e^(u  +  f )  =  tr(t/)  +  tv{V)  =  k,  and  0  <  u  <  e,  0  <  t;  <  e,  by  (4.3), 
i.e.  u'  G  4'n,K-  i 

To  characterize  the  elements  which  achieve  the  maximum  in  the  following  results 
requires  information  about  the  multiplicity  of  the  eigenvalues  of  A.  Although  the 
ordering  (4.2)  is  useful  to  give  the  simple  definition  (4.1)  for  ^^(A),  as  well  as 
(^  =  l/i^l,  we  shall  now  revert  to  the  standard  ordering. 

Consider  the  case  when  C  >  0  and  let  the  eigenvalues  of  A  be  written 

Ai  >  •  •  •  >  A,,  > 

Arj+l    =••■=(   =   •••   =   Ki+ti    > 

Ar,+(,+i  >--->A„_,,_,,  >  (4.8) 

An-r2-(2  +  l    =   •  •  •   =   — C  =  •  •  •   =   An_r2    > 
An-r2  +  l    ^    ■  ■  ■   ^   ^ni 

where  ri ,  f  i ,  r2,  ^2  are  nonnegative  integers.  The  eigenvalue  equal  to  (  has  multiplic- 
ity ^1  and  the  eigenvalue  equal  to  —  (^  has  multiplicity  /2.    By  assumption  there  is 
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at  least  one  eigenvalue  with  modulus  (^,  so  ti  -\- 12  >  I.  The  number  of  eigenvalues 
greater  than  C  is  T],  while  the  number  of  eigenvalues  less  than  —(  is  r2.  Note  that 
by  definition 

ri  +  r2  +  I  <  K  <  ri  +  ti  +  r2  +  ^2-  (4.9) 

Thus  ri  =  r2  =  0  if  K  =  1.  Also  i{  ti  =  1  and  ^2  —  0,  or  t^  =  0  and  <2  =  1  then 
K  =  n  +r2  +  l. 

We  have  A,  >  0  for  i  =  1, . . . ,  ri  +  ti  and  A,  <  0  for  i  =  n  —  r2  —  t2  +  1,. . .  ,n. 
Thus 

g.{A)  =  J2  l/^.l  =  iZ  ^.   -        E      ^.   +   («  -  ri  -  r2)C.  (4.10) 

1  =  1  1  =  1  i=n  — rj  +  l 

Example  4.2  Suppose  n  =  11  an^  A^  =  (5,4,4,4,2,-1,-4,-4,-6,-6,-7).  If 
K  =  2  or  3  ^/iCTi  ^  =  6,  ri  =  0,^1  =  0,  r2  =  1  and  ^2  =  2.  //  k  =  5,6,  or  7  then 
C  =  4,  ri  =  1,  <i  =  3,  r2  =  3  anti  ^2  =  2. 

If  C  =  0  then  the  appropriate  ordering  is 

Ai  >•••>  A,,  > 

A,,+i  =  ---  =  0  =  ---  =  A,,+,  >  (4.11) 

An-7-2  +  1    ^    •  •  •    ^   A„, 

and  n  —  r2  =  ri  -\-  t.  Only  the  situation  where  C  >  0  will  be  considered  in  the  rest 
of  this  section.  The  modifications  required  when  (,'  =  0  can  be  derived  from  the 
ordering  (4.11). 

We  first  give  two  lemmas  which  depend  only  on  the  orderings  (4.2),  (4.8)  for 
a  single  set  of  n  real  numbers,  and  not  on  the  fact  that  these  are  eigenvalues  of  a 
matrix  A. 

Lemma  4.3  If  the  vector  A  G  R"  is  ordered  as  in  (4-8)  then 

K 

max    A   w  =  y^  |;x,[ 

and 

argmax  {  X^w  :  w  G  V'n.K  }     =     {  if  €  R"  : 

Wi  =  1  for  i  =  1 , . . . , Ti , 

0  <  Wi  <  1  for  z  =  Ti  +  1, . . . ,  Ti  +  ^1 , 

w,  —  0  for  z  =  ri  +  f  ]  +  1, . . .  ,  7?  —  r2  -  ^2, 

—  1  <  u',  <  0  for  ?'  =  n  —  r2  —  <2  +  1,  •  •  • , "  —  ^2i 

Wi  =  —  1  for  i  =  n  —  r2  +  1 , . . . ,  n  and 

ri+ti  n-rj 

Y^    Wi    -  ^         u),    =    K  -  ri  -  r2  }.(4.12) 

:=ri+l  i=Ti  — rj  — t2  +  l 
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Proof:  The  result  follows  directly  from  (4.2),  (4.8)  and  the  definition  (4.4)  of  i/'n, 
I 


Lemma  4.4   Let  A  =  diag(A).    The 


n 


max    {A,Ty)  =  ^|/..|  (4.13) 


and 


argmax  {{K,W)  -.W  ^  '!'„,«  }  = 
{W  ^Sn-.W  =  U  -V,    where 

u  =  diag(/,  u,  0, 0, 0),    u  eSt,,    o<u  <  /, 
y  =  diag(o,o,o,F,/),    veSt,,    o<v<i, 

tr(i7)  +  tr(t>)  =  K-ri  -r2  }.  (4.14) 

Here  the  diagonal  blocks  of  U  and  V  have  dimensions  ri,  ti,  n  —  r^  —  ti  —  t2  —  r2, 
<2  and  7-2  respectively. 

Proof:  As  A  =  diag(A) 

{A,W)    =   X^A.T^.,. 
1=1 

Equation  (4.13)  follows  from  Lemmas  4.1  and  4.3. 

If  W*  is  any  element  of  the  right  hand  side  of  (4.14)  then  from  (4.8)  and  (4.10) 

(A^W)    =   f^A.  +  Ctr(t/)  +  Ctr(F)-      j^      ^'    =   El^^l-  (4.15) 

1  =  1  i=n  — r2  +  l  1  =  1 

Conversely  let  W  G  argmax  {  {A,W)  :W  e  '!'„.«  }.  Then  w'  =  (W^^  . . .  W„'„)^  G 
V'n.K  satisfies  \^w'  =  XIJLj  |/i,|,  and  therefore  also  satisfies  the  properties  given 
on  the  right-hand  side  of  (4.12).  Furthermore,  W*  has  the  representation  W  = 
U'  -  V,  where  0  <  U'  <  I,  0  <  V  <  I,  tr(t/')  +  tr(V)  =  k,  so 

^u  =  l,      V,:  =  0  (ov  i  =  l,...,ru 

0  <  t/*  <  1,      F.:  =  0  for  z  =  r,  +  1, . . .  ,ri  +  fi, 

Uii  =  0,      ]/*  =  0  for  z  =  ri +<i +  l,...,n-r2-^2,  (4.16) 

Uii  =  0,      0  <  V;*  <  1  for  I  =  n  -  r2  -  <2  +  1,  •  •  • ,  n  -  r-2, 

U'i  =  0,      V*  =  1  for  i  =  n-r2  +  l,...,n  and 

E    ^n    +  E         ^i:  =  K-ri-r2}. 


i  =  ri  +  l  i=n  — r2  — (2  +  1 
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Partition  the  rows  and  columns  of  W  =  U'  —  V  into  blocks  so 


U'  = 


where  the  dimensions  of  the  symmetric  matrices  C„  and  En  for  i  =  1, ...  ,5  are 
respectively  ri,  t\,  n  —  ri  —  ti  —  r2  —  ^21  ^2  and  r2.  As  W  €  ^n,« 

0<C„</,      z  =  l,...,5 
0  <  £^..  < /,      i  =  l,...,5 


Cu 

Cu 

Ci3 

C\A 

C15" 

■^11 

£^12 

i:i3 

£^14 

^15 

^\2 

C22 

^23 

C24 

C25 

^?2 

£22 

-E^23 

£"24 

■£"25 

^13 

^23 

^33 

C34 

C35 

,         1^'    = 

-^13 

-E'23 

E33 

£"34 

£^35 

'-'14 

<^24 

^34 

C44 

^45 

^?4 

-^24 

■^34 

■E'44 

£:45 

^25 

^35 

'-'45 

C55. 

L^?5 

^2^5 

£J5 

EL 

£^55 

^tr(C.,)  +  £tr(£;.,)  =  K. 


1=1 


1=1 


As  in  Section  3  the  proof  is  completed  using  the  basic  results  (3.16)  and  (3.17)  for 
positive  semi-definite  matrices.  From  (4.16)  Cu  =  I,  as  Cu  <  /  and  the  diagonal 
elements  of  Cu  are  all  1.  Then  as  U*  <  I,  Cu  =  I  implies  that  Cij  =  0  for  all 
j  7^  1.  For  i  =  3,4,5  C„  =  0,  as  C„  >  0  and  the  diagonal  elements  of  C„  are  all 
zero.  Then  for  i  =  3,4,5  U'  >  0  and  C„  =  0  imply  that  C,j  =  0  for  all  j  ^  i. 
Similarly  F„  =  0  for  i  =  1, 2, 3  and  £55  =  I.  Then  0  <  V  <  I  implies  that  E,j  =  0 
for  all  i  ^  j.  The  matrices  U*  and  V  now  reduced  to  those  given  in  (4.14).  The 
trace  condition  comes  directly  from  (4.16).  I 

Now  consider  sums  of  the  eigenvalues  of  A,  and  recall  that  Q  G  0„,„  is  a  matrix 
of  eigenvectors  for  A  satisfying 


Q^'AQ  =  A. 


(4.17) 


Let  Pi  E  Cn.ri ,  £2  G  On,T2  bc  the  matrices  consisting  of  the  first  rj  and  last  r2 
columns  of  Q  respectively.  Also  let  Qi  €  0„,ti  be  the  matrix  consisting  of  columns 
r,  -I-  1, . . .  ,ri  -(-  fi  of  Q,  and  let  Q2  G  Cn,«2  be  the  matrix  consisting  of  columns 
n  —  r2  —  t2  +  I, . . .  ,71  —  r2  o{  Q.  By  (4.8)  we  have 


and 


Pi^APi  =  diag(Ai,...,A,,),      P/"^P2=diag(A„_,,+i,...,A„),  (4.18) 


QjAQ,  =  C/,      QIAQ2  =  -CI.  (4.19) 
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Theorem  4.5 

max  {A,W)    =    g^),  (4.20) 

with 

argmax  {{A,W)  -.W  ^  ^„,«  }  = 

{W  ^Sr,:W  ^U  -V,    where  t/,  F  G  5„ 
t/  =  PiPj^  +  Q^iJQl,      V  =  P2P2'^  +  Q2VQI, 

u  eSt,,o<u  <i,    V  eS,,,o<v  <i 

and  tr(t/)  +  tr(F)  =  K-ri -r2  }.  (4.21) 


Proof:  For  any  W  G  ^n,K  equation  (4.17)  and  the  properties  of  the  Frobenius 
inner  product  imply  that 

{AW)   =   (QAQ'^^W)   =   {A,Q^WQ),  (4.22) 

with  Q^WQ  G  ^n,K  as  ^„  ,5  is  invariant  to  orthogonal  similarity  transformations. 
Hence 

max    {A,W)=    max    {A,W)=gJA), 

where  the  second  equality  follows  from  Lemma  4.4. 

From  (4.22)  and  the  invariance  of  ^„,„  to  orthogonal  transformations 

argmax  {{A,W):W  e  ^n,«  }  =  {  QW'Q'^  -.W  e^] 

where  fi  =  argmax  {  {A,W)  :  W  ^  ^„,«  }.    The  proof  is  completed  by  applying 
Lemma  4.4  since 

QW'Q^    =    Q  diag(/,  U,  0, 0, 0)Q'^  -Q  diag(0, 0, 0,  V ,  /)  Q^ 
=    P,P^  +  Q,UQj-Q2VQl-P2Pl. 


Remark:  If  k  =  ri  +  r2  +  ti  +  ^2  then  U  =  I,  V  =  I  and 

argmax  {{A,W):We  *„,«  }    =    PiP^  +  QiQf  -  Q2QI  -  P2PI ■  (4.23) 

The  matrix  achieving  the  maximum  in  (4.13)  is  unique  only  in  this  case.     It  is 
precisely  the  case  when  gK.{A)  is  a  smooth  function  of  the  elements  of  A. 

Remark:  If  <i  =  1,<2  =  1,    and  k  =  ri+r2  +  l,  then  U  =  a,V  =  /3  where  a,/3  G  R 
satisfy  a,  /?  G  [0, 1]  and  a  +  /?  =  K  —  ri—  r2  =  l.  Also 

argmax  {  (  A,  W  )  :  W  G  *„,«  }    =    PiP[  +  aQ,Ql  -  /3Q2QI  -  P2Pi';       (4.24) 
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here  Q\  and  Q2  have  only  one  column.  This  is  precisely  the  case  when  5«(>1)  is 
nonsmooth  but  is  the  maximum  of  a  finite  number  of  smooth  functions  (Ar,+i  and 

Remark:  When  <i  >  1  or  ^2  >  1  all  the  flexibility  in  the  choices  of  Qx  and  Q2 
in  (4.19)  is  absorbed  into  the  matrices  U  and  V .  It  makes  no  difference  if  any  of 
the  eigenvalues  Ai,...,Ari  or  A„_r2+i, . . . ,  A„  have  multiphcity  greater  than  1,  as 
all  these  multiple  eigenvalues  are  included  in  g^i-A.)  and  the  corresponding  columns 
of  Pi  or  P2  can  be  any  orthonormal  basis  for  the  corresponding  eigenspace. 

Corollary  4.6  The  function  fi'K(^)  :  iS„  — >  R  is  convex  and  its  subdifferential 
dgK{A)  is  the  nonempty  compact  convex  set 

dgK.iA)    =     {  W  ^  Sn  ■  there  exists 

U  e  St,  with  0<U  <I, 

V  e  St,  with  0  <  F  <  /, 

tT{U)  +  tT{V)  =  K-ri  -r2, 

W  =  P,Pj  +  QxijQl  -  Q2VQI  -  P2PI  }.  (4.25) 


Proof:  For  any  W  G  ^n,^  the  inner  product  {A,W)  is  a  linear  function  of  A  G 
5n,  so  from  Rockafellar  (1970)  gK^A)  is  a  convex  function  of  A.  Moreover  the 
subdifferential  is  , 

dg.{A)   =   conv{W  ^Sn:{A,W)=g,{A)]. 

The  result  follows  from  Theorem  4.5  as  the  set  on  the  right  hand  side  of  (4.25)  is 
already  convex.  ■ 

As  in  Section  3  there  are  various  equivalent  forms  for  the  argmax  involving  a 
convex  hull  operation,  which  are  discussed  in  the  Appendix.  These  lead  to  equiva- 
lent, but  computationally  less  useful,  expressions  for  dg^iA). 

4.2      The  generalized  gradient 

As  in  Section  3.2  let  A{x)  :  R""  — >  5„  be  a  continuously  differentiable  function  with 
partial  derivatives 

A,{x)    =    ^^  for  A-=l,...,m. 
dxk 

This  section  is  concerned  with  finding  a  computationally  useful  characterization  of 

the  generalized  gradient  of  the  function 

gM    =   g^{A{x)). 
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The  eigenvalues  of  A{x)  are,  as  before,  denoted  by  by  both  (4.2)  and  (4.8),  with 
J"!?^!!  ^"2,^2  now  dependent  on  x,  and  with  the  corresponding  eigenvectors  satisfying 
(4.18),  (4.19).  Thus  the  exphcit  dependence  of  the  eigensystem  of  A{x)  on  x  is 
generally  omitted. 

Theorem  4.7    The  function  g^{x)  is  locally  Lipschitz,  suhdifferentially  regular,  and 
its  generalized  gradient  is  the  nonempty  compact  convex  set 

dg^{x)    =    {  w  eR"" -.3  U  e  St,,  V  e  St,  with  (4.26) 

0<U  <I,  0  <  y  <  /,  tT(U)  +  tr(t>)  =  K-ri-  r2, 
and  for  A;  =  1 , . . . ,  m 

wk   =   iv{PlAkix)P,)  +   {Q'(Akix)Qr,U) 
-tT{P^Akix)P2)   -    {Ql^A,{x)Q2,V)}.  (4.27) 


Proof:  By  the  Clarke  chain  rule,  ^k(x)  is  locally  Lipschitz,  subdifFerentially  regular, 
and 

dg.ix)  =  {weK"':w,   =    {Akix),W),:Wedg.iA(x))}. 

Corollary  4.6  and  the  properties  of  the  inner  product  complete  the  proof.  I 

Corollary  4.8  If  k  =  ri  +  ti  +  r2  +  t2  then  the  function  ^^(2^)  ^^  differentiable  at 
X  with 

^1^    =    tT{P^Ak{x)P,)  +  tviQjAkix)Qi) 

OXk 

~iv{QlAk{x)Q2)-ir{PjAk{x)P2).  (4.28) 


Proof:    This  is  precisely  the  case  when  (4.21)  and  hence  dgnix)  is  a  singleton. 
Hence  g^[x)  is  differentiable,  and  (4.27)  reduces  to  (4.28).  I 

4.3      Necessary  conditions 

If  X  is  a  local  minimizer  of  g^  then  the  standard  necessary  condition  0  €  dg^ix)  and 
Theorem  4.7  imply  there  exist  U  G  «?(,  and  V  G  St,  such  that 

0<U  <I,     0<V  <I,  (4.29) 

tr(f/)  +  tr(t>)  =  K-ri  -r2,  (4.30) 
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and  for  fc  =  1 , . . . ,  m 

0    =     tr(Pi^^,(a:)Pi)   +    {Qx'^  Ak{x)QuU) 

-tr{P/Akix)P2)   -   {Q2'Ak{x)Q2,V).  (4.31) 

These  conditions  are  useful  computationally  as  one  can  relax  the  inequalities  on 
il  and  V  and  solve  (4.30)  and  (4.31)  for  U  and  V .  This  requires  solving  a  system  of 
m  + 1  Hnear  equations  for  the  <i (<i  + 1  )/2  +  ^2(^2  + 1  )/2  unknowns  in  the  symmetric 
matrices  tj  and  V .  When  k  =  1  these  are  the  same  conditions  as  those  used  in 
Overton  (1988).  If  the  inequahties  0<i5'</orO<V'</  are  not  satisfied  then 
this  information  can  be  used  to  generate  a  descent  direction;  see  Section  4.6. 

If  g^{x)  is  convex  (e.g.  if  A{x)  is  affine)  then  equations  (4.29),  (4.30)  and  (4.31) 
are  both  necessary  and  sufficient  for  ar  to  be  a  minimizer  of  ^'^(a'). 

4.4      A  saddle  point  result 

As  in  Section  3.4  it  is  possible  to  establish  saddle  point  results  for  the  Lagrangian 
function 

C{x,W)   =    {A{x\W),  (4.32) 

where  W  G  ^„,«.  The  primal  problem  is 

min  g,{x)  (4.33) 


where 


The  dual  problem  is 


where 


gAx)  =    max    C(x,W).  (4.34) 


max    h{W),  (4.35) 


h(W)=  min    C{x,W).  (4.36) 

xeR*" 


Theorem  4.9  For  each  W  G  *„,«  let  C{.,W)  be  a  convex  function,  and  let  the 
primal  problem  attain  its  solution  at  x*.  Then  the  primal  and  dual  problems  have 
the  sam,e  optim.al  value  so 

min      max    C(x,W)  =    max       min   C(x,W),  (4.37) 

and  W  e  argmax{  (  A{x'),  W  )  :  W  e  *„,«  }  satisfying  ( .4^.r*),  W  )  =  0  for  ^-  = 
1, . . .  ,  m  solves  the  dual  problem,. 

Proof:  Analogous  to  the  proof  of  Theorem  3.11.  I 
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4.5      The  directional  derivative 

As  g^i^)  is  subdifFerentially  regular  the  standard  one-sided  directional  derivative 
g'i^{x;d)  at  X  in  a  direction  d  G  R"*  exists  and  satisfies 


9'^{x\d.) 


max     iLi   d 

wedgK{x) 


J^dk  [tTiP^AkP,)-tr{P^AkP2)]  + 

m 


u,v 


k=\ 


where  the  maximum  is  over  all  t/  6  «S'(, ,  F  €  St^  satisfying 

0<U  <I,  0  <  y  <  /  and  tr([/)  +  tr(y )  =  k  -  n  -  r2. 


(4.38) 


(4.39) 


Recall  that  the  matrices  Pi,  P2  and  Qi,  Q2  are  evaluated  at  the  point  a*,  and  that 
Ak  is  the  partial  derivative  of  A{x)  with  respect  to  Xk  evaluated  at  the  point  x.  For 

A;  =  1 , . . .  ,  771  define 

h  =  tT(P^A,P,)     -     tv(P^AkP2), 

and 

QjAkQ^  0 

0  -Q^AkQ2 


Bk  = 


(4.40) 
(4.41) 


Note  that  Bk  €  <5(i+(2-  Let 


B{d)  =  Y,dkBk. 

k=\ 


and  let  the  eigenvalues  of  B{d)  be  71,  >  ...  >  7(i+«2.  Also  let 

T  = 


U     0 
0     V 


(4.42) 


(4.43) 


Then  from  Theorem  4.5  and  (4.38)  and  (4.39)  it  follows  that 

g'^{x;d)  =  }P'd  +  mQ.x{B{d),T) 

where  the  maximum  is  over  all  matrices  T  satisfying  (4.39)  and  (4.43).  Since 
B{d)  is  block  diagonal,  the  maximum  may  equivalently  be  taken  over  matrices 
T  G  $/,+(2,K-r]-T-2-  Hence  the  directional  derivative  is 


K-Ti-T-i 


g'^{x;d)  =  b^d   +       Yl     7., 


(4.44) 


t=i 


the  sum  of  all  r^  eigenvalues  of  YlT=i  ^k  P\  AkPi  minus  the  sum  of  all  r2  eigenvalues 
of  YL'k=\  dk  P2  AkP2  plus  the  sum  of  the  k  —  ri  —  r2  largest  eigenvalues  of  B{d). 
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4.6      Splitting  multiple  eigenvalues 

This  section  discusses  the  generation  of  a  descent  direction  from  points  where  the 
first-order  optimahty  conditions  do  not  hold.  In  the  first  case  a  descent  direction 
is  generated  keeping  the  eigenvalues  corresponding  to  (  of  multiplicity  ^i,  and  the 
eigenvalues  corresponding  to  —(,  of  multiplicity  ^2  (to  first  order).  The  second  case 
requires  splitting  at  least  one  eigenvalue  away  from  the  common  value  ±^. 

When  3k(x)  is  differentiable  (k  =  ri  +ti  -|-<2  +  ^2)  and  the  gradient  is  nonzero  its 
negative  provides  a  descent  direction.  Therefore  we  only  consider  the  nonsmooth 
case  K  <  ri  +  ti  +  t2  +  r2. 

Case  1.  /  €  Span{  Bi, . . .  ,  Bm  }• 

Solve  the  system 


6I-J2dkB,    =    0  (4.45) 

m 

(K-r,  -r2)<5  +  ^d,6,     =     -1.  (4.46) 

Noting  the  structure  of  Bk  in  (4.41)  this  is  a  system  of /i(ii  +  l)/2  +  <2(*2  +  l)/2 
linear  equations  in  m  -)-  1  unknowns  6,di, . . .  ,dm-  Equation  (4.45)  implies  that  the 
eigenvalues  of  B{d)  defined  by  (4.42)  are  all  equal  to  6.  Hence  from  equation  (4.44) 
and  (4.46)  g'^{x;d)  —  —1  where  the  direction  d  G  R"*  has  components  rfi, . . .  ,dm- 
To  first  order  all  the  eigenvalues  Ar,+i(a;), . . . ,  Ar,+(j(a;)  decrease  at  the  same  rate 
along  d,  and  all  the  eigenvalues  Xn-T2-h+i{^)i  •  •  •  )  ^n-rii^)  increase  at  the  same  rate 
along  d.  Thus  S  gives  a  first  order  estimate  of  the  change  in  C  along  d.  This  case 
holds  generically  if  the  manifold  defined  by 

K,+,ix)  =  ---  =  K,^tAx)  =  C  (4.47) 

and 

A„_,,_<,+i(x)  =  •  •  •  =  A„_,,(a-)  =  -C  (4.48) 

has  dimension  greater  than  zero,  so  m  >  ti{ti  -f  l)/2  +  ^2(^2  +  l)/2. 

Case  2.  Case  1  does  not  apply  and  {  /,  Bi, . . . ,  Bm  }  has  full  rank  ti{ti  +  l)/2  + 
t2{t2  +  l)/2. 

Solve  the  linear  system 

tr(t/)  +  tr(y)  =  K  -  n  -  r2  (4.49) 

-{B,,T)=h     k  =  l,...,m  (4.50) 

for  the  dual  matrices  tj  G  St^  and  V  E  St^,  where  T  is  defined  by  (4.43).  This  is 
a  system  of  m  -(-  1  equations  in  <i(<i  +  l)/2  +  ^2(^2  +  l)/2  unknowns.    A  similar 
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argument  to  that  given  in  Section  3.6  shows  that  a  unique  solution  exists.  If  U 
satisfies  0  <  U  <  I  and  V  satisfies  0  <  V  <  I  then  0  G  dg^ix)^  so  j:  is  a  stationary 
point. 

Theorem  4.10  Suppose  (4-4^)  '^^'^  (4-^0)  are  satisfied  but  0  ^  dg^i^x),  so  either 
U  or  V  has  an  eigenvalue  outside  [0,1]. 

If  an  eigenvalue  0   of  U   lies  outside   [0,1],    let  z    G   R''    he   the  corresponding 
normalized  eigenvector  of  U .  Let  /?  G  R  and  solve 


8I-Y,dkBk 


/3zz'^    0 
0       0 


(4.51) 


Alternatively  if  an  eigenvalue  r]  of  V  lies  outside  [0, 1],  let  y  G  R'^  be  the  corre- 
sponding normalized  eigenvector  of  V.  Let  /?  G  R  and  solve 


81-Y.dkB,    = 


fc=i 


0        0 

0    l^yy'' 


(4.52) 


Then  /3  can  be  chosen  so  d  =  [di  •  ■  ■  ,dm]^  »•»  a  descent  direction. 


Proof:  First  consider  the  case  when  an  eigenvalue  9  oi  U  lies  outside  [0, 1].  The 
linear  system  (4.51)  is  solvable  by  hypothesis,  although  d  is  not  unique  if  {Bk]^k  = 
1, . . . ,  m,  are  linearly  dependent.  Taking  an  inner  product  of  (4.51)  with  T  defined 
in  (4.43)  gives 

m 

6  tv{T)  -Y.d.iBk^T)  =  0{zz'',U), 
k=i 


so  from  (4.49)  and  (4.50) 


From  (4.42)  and  (4.51) 


6{k  -  ri  -  ra)  +  b^d  =  pd. 


(4.53) 


B{d)  =  61 


/3zz'^    0 
0       0 


has  ti  -\- 12  eigenvalues  S  —  (3,6,. . .  .,6. 

If  /3  <  0,  the  sum  of  the  K  —  Vi  —  r^  largest  eigenvalues  of  B{d)  is  6{k  —  ri  —  r^)  —  ^. 
Hence  (4.44)  and  (4.53)  give 


g'^{x-d)  =  fi{e-\) 


(4.54) 


Thus  if  ^  >  1  choosing  /9  <  0  and  solving  (4.51)  produces  a  descent  direction. 

On  the  other  hand,  if  /?  >  0,  the  sum  of  the  k  —  rj  —  r2  largest  eigenvalues  of 
B{d)  is  6{k  -r^^-  rj).  Then  (4.44)  and  (4.53)  give 


g:{x-d)  =  fie. 


(4.55) 
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Thus  if  ^  <  0  choosing  /?  >  0  and  solving  (4.51)  produces  a  descent  direction. 

The  proof  follows  the  same  lines  when  an  eigenvalue  rj  of  V  lies  outside  [0, 1] 
and  (4.52)  is  solved  instead  of  (4.51).  I 

Remark:  In  general,  the  system  (4.51)  splits  the  multiple  eigenvalue  Ar,+i  = 
...  =  Ar,+ti  =  Ci  to  first  order,  reducing  the  approximate  multiplicity  by  one, 
while  (4.52)  splits  A„_r2-<2+i  —  ■  •  •  —  ^n-r2  =  ~C-  A  special  case  occurs  when 
ti  =  t2  =  I  (and  consequently,  by  assumption,  k  =  ri  +  r2  +  1).  In  this  case  both 
eigenvalues  already  have  multiplicity  one,  but  the  direction  d  splits  the  common 
value  C  =  Ar,+i  =  -A„_r2. 

Remark:  For  the  case  k  =  1  this  result  was  given  by  Overton  (1988).  In  the 
previous  work  with  k  =  1  the  inequalities  U  <  I  and  V  <  I  did  not  appear.  This 
is  because  k  =  1  implies  ri  =  r2  =  0  and  the  conditions  tv(U)  +  tv{V)  =  1,  U  >  0 
and  V  >0  imply  that  tJ  <I  and  V  <  I. 

Case  3.  Neither  of  Cases  1  or  2  apply.  In  this  case  degeneracy  is  said  to  occur. 
Generation  of  a  descent  direction  is  not  straightforward. 

4.7      Model  algorithms 

As  in  Section  3.7  algorithms  for  minimizing  g«(a;)  may  be  developed  based  on 
successive  linear  or  quadratic  programming. 

Suppose  that  x'  is  a  (local)  minimizer  of  ^«(x),  with  corresponding  values 
rj',r2,<i  and  ij  (defined  by  (4.8)).  The  model  algorithm  must  use  estimates,  say 
ri ,  r2,  ^1  and  <2,  of  these  quantities.  Note  that,  in  general,  the  matrix  iterates  gener- 
ated by  the  algorithm  will  have  eigenvalues  which  are  strictly  multiple  only  in  the 
limit  at  X  =  x'. 

The  basic  step  of  a  model  algorithm  for  minimizing  ^«(x)  then  becomes  the 
solution  of  the  following  linear  or  quadratic  program: 

min  (K-ri-r2)S  +  b^d+U'^Hd  (4.56) 

m 

SI  -J2^kQjAkQi    =   diag(A,,+i,...,A„+<j)  (4.57) 

m 

6/ +  ^  JiQ^^fcQj    =    -diag(A„_r2-tj+i,...,A„_,J       (4.58) 

(4.59) 

Here  the  quantities  Qi,  (^2,  A,  and  b  (defined  by  (4.40))  are  evaluated  at  the 
current  point  x.  The  new  trial  point  is  x  +  d,  and  6  is  an  estimate  of  (  evaluated 
at  the  point  x  +  d. 
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Equation  (4.57)  represents  the  appropriate  linearization  of  the  nonhnear  system 
A„+i(x  +  (!)  =  ■■■  =  \r,+t,{x  +  d)  =  S,  (4.60) 

while  (4.58)  is  the  linearization  of 

A„_r2-<2+i(^  +  d)  =  ■■•  =  K-rA^  +d)  =  -6.  (4.61) 

The  first  two  terms  in  the  objective  function  (4.56)  represent  a  linearization  of 
Qni^  +  d),  while  the  third  term  may  be  used  to  incorporate  second  derivative  in- 
formation. The  Lagrange  multipliers  corresponding  to  the  <i(<i  +  l)/2  equality 
constraints  (4.57)  make  up  a  dual  matrix  estimate  of  {7,  while  the  Lagrange  multi- 
pliers corresponding  to  the  ^2(^2  +  l)/2  equality  constraints  (4.58)  make  up  a  dual 
matrix  estimate  of  V. 

Inequalities  may  be  added  to  the  quadratic  program  to  ensure  that  linearizations 
of  the  inequalities  X,{x  +  d)  >  6  for  i  =  l,...,ri,  —6  <  A,(x  +  d)  <  S  for  i  = 
^1  +  ^1  +  I, . . .  ,n  —  r^  —  t2  and  X,{x  +  d)  <  —S  for  i  =  n  —  r +  2+1,... ,n  are 
satisfied.  A  trust  region  constraint  may  be  added  to  ensure  a  reduction  in  the 
objective  function  at  the  trial  point.  See  Overton  (1990)  for  further  explanation  of 
these  ideas. 


A      Appendix 

This  Appendix  discusses  further  properties  of  the  sets  tpn,K  and  ^„,«  which,  while 
not  needed  in  the  derivation  of  the  subdifFerential  of  g^i-^)  given  in  Section  4.1,  are 
of  some  independent  interest.  They  lead  to  an  alternative  representation  for  the 
subdifFerential  of  g^iA),  in  much  the  same  way  that  Fan's  theorem  was  obtained  in 
Section  3.1  as  a  corollary  of  our  main  results  and  led  to  (3.28),  the  alternative  form 
for  the  subdifFerential  of  /^(A). 

As  already  mentioned,  the  techniques  in  Section  4  are  related  to  the  idea  of 
representing  a  scalar  a  by  its  positive  part  a+  =  max{0,a}  ajid  its  negative  part 
q;_  =  max{0,  —a},  so  a  =  a+  —  a_  and  \a\  =  a+  -f- cv_.  The  same  definition  applies 
to  a  vector  componentwise,  and,  as  will  be  seen,  can  also  be  extended  in  a  natural 
way  to  matrices. 

Consider  the  set 

n 

e„,K  =  {u>GR":    -e<u;<e,      ^|u'.|  =  k}.  (A.l) 

1=1 

It  is  immediately  apparent  that  i^„,k  C  V'n.K,  since  for  any  w  G  ^n,K,  setting  u  —  w+, 
V  =  w-  satisfies  the  requirements  w  =  u—v,  0<u<e,0<v<e,  and  e^{u+v)  =  k. 
However,  the  example  n  =  k  =  1  and  u  =  0.75,  v  =  0.25  shows  that  the  converse 
does  not  hold,  since  w  =  0.5  €  V'n,«,  but  w  ^  (n,K-  Instead,  the  following  holds: 
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Lemma  A.l 

n 

V'n.K  =  {  ty  G  R-"  :    -e  <  u)  <  e,      ^|u>,|<k}.  (A.2) 


i  =  l 


Proof:  Let  w  6  V'n.K,  so  w  =  u  —  v ,  0  <  u  <  e,  0  <  v  <  e,  and  e^(u  + 1')  =  k.  Then 

\wi\  =  |u,  —  Vi\  <  max{u,,  v,}  <  u,  +  u, 

so  «'  is  an  element  of  the  right-hand  side  of  (A.2).  Conversely,  if  w  is  an  element 
of  the  right-hand  side,  the  following  algorithm  produces  vectors  u  and  v  which 
demonstrate  that  w  —  u  —  v  E^  xj)n,K  ■ 

n 

A:=K-^|u;.| 

i=l 

For     z  =  1 , . . . ,  n     do 

a  :=  I  min{l  —  |u;,|.  A} 
u,  :=  [u;,]+  -I-  a 
Vi  :=  [iw,]-  +  a 
A:=  A-2ci. 

Each  step  enforces  to,  =  u,  —  u,  ,  0  <  u,  <  1,  0  <  i>,  <  1  but  increases  e^{u  -\-v)  until 
it  equals  k.  Note  that  A  must  be  nonnegative  initially  and  zero  on  termination 
(since  X^"_j  |u^t|  <  '^  <  ")-  i 

An  immediate  corollary  is 
Corollary  A.2    The  set  of  extreme  points  of  the  compact  convex  set  xJ;n,K  i^ 

n 

{ly  e  R"  :    w,e  {0,±1},    for  ?  =  l,...,n,      ^\w,\  =  k), 

:  =  1 

that  is  the  n-vectors  with  K\  elem,ents  equal  to  +1,  K2  elements  equal  to  —1,  and  all 
other  elements  equal  to  zero,  where  kj  >  0,  K2  >  0  and  ki  +  K2  =  k. 

These  results  are  now  extended  to  the  set  *„,/«,  by  consideration  of  the  eigenval- 
ues of  W  =  U  —  V.  The  first  step  is  the  following  lemma,  for  which  it  is  necessary 
to  order  the  eigenvalues  of  U,  V  and  VF  so  as  to  be  able  to  exploit  a  classical  result 
of  Weyl.  (The  last  statement  in  this  lemma  is  not  actually  needed  but  is  included 
for  completeness.) 
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Lemma  A.3  Let  W  =  U  -  V,  where  U  >  0,  V  >  0  and  tT{U)  +  tr(F)  =  k.  Lei 
<7i  >  •  ■  •  >  <7n>  ''"i  ^  ■  ■  ■  ^  ■''n,.  0,1^(1  u;]  >  •  •  •  >  u^n  respectively  denote  the  eigenvalues 
of  U ,  the  eigenvalues  of  V  and  the  eigenvalues  of  W  =  U  —  V .    Then 

n 

^|u;.|<K.  (A.3) 

i=l 

Moreover,  equality  holds  in  (A.3)  if  and  only  if,  for  i  =  1, .  .  . ,  n, 

a,r„_,+i  =  0  (A. 4) 

and 


iOi 


(Ti  if  Gi  >  0; 

0  if(T,  =  Tn-,+1  =0;  (A.5] 

,   — r„_,+i     if  Tn-i+i  >  0. 


Proof:  Weyl's  theorem  (see  e.g.  Horn  and  Johnson  (1985),  Section  4.3)  shows  that 
for  j  =  1,...  ,n 

(Tn  —  T„-j  +  l   <  Uj  <  a  J  —  r„. 

As  t^  >  0,  y  >  0  this  impHes  that 

-  7-n-j  +  l    <l^]   <  CTj,  (A. 6) 

so 

\ujj\  <  max{  aj,    t^-j+i  ]  <  (Tj  +  t^-j+i.  (A. 7) 

Hence 

n  n  n 

J=l  J=l         J=l 

which  estabhshes  (A.3). 

Suppose  now  that  (A.3)  is  an  equahty.  Then  (A. 8)  holds  with  equahty,  i.e. 
(A. 7)  holds  with  equality  for  j  =  1, . . . ,  77.  This  implies  that  for  each  j  =  1, . . .  ,  n 
at  least  one  of  aj  and  t^-j+i  is  zero,  establishing  (A. 4). 

If  r„  >  0  equation  (A. 4)  and  the  ordering  of  the  eigenvalues  imply  that  U  =  0. 
Similarly  if  <7„  >  0  then  V  =  0.  In  both  cases  (A.5)  is  trivial.  Consider  now  the 
case  when  (Tn  =  r^  —  0.  If  cr^  >  0  then  t^-j+i  =  0  so  (A. 6)  implies  that 

0  <  ojj  <  aj. 

Similarly  if  r„__,+i  >  0  then  cTj  =  0  so 

-Tn-j  +  l  <  i^j  <  0. 
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If  (7j  =  T„-j+i  =  0  then  LJj  =  0.  Thus  if  (A. 3)  holds  with  equahty  then 

n 

^         J2     ^J         +  Yl  ^"-J  +  1 

=       K. 

The  only  way  this  can  hold  with  equality  is  if  (A. 5)  holds. 

Conversely  as  U  >  0,  V  >  0  and  tr(tO  +  tr(y)  =  k  equations  (A. 4)  and  (A. 5) 
directly  give  (A. 3)  with  equality.  I 

The  following  results  are  stated  using  the  positive  and  negative  parts  of  a  sym- 
metric matrix,  although  they  could  also  be  stated  directly  in  terms  of  eigenvalues. 
Let  W  E:  Sn  have  eigenvalues  u,  for  i  =  l,...,n  and  corresponding  normalized 
eigenvectors  u^,  so 

n 

W  =  Y^ij,v,v'[. 
1=1 

The  positive  and  negative  parts  of  the  symmetric  matrix  W  are  then 

W+  =  f2[^,Uv,vJ,         '  (A.9) 

t=i 

W_  =  J2[uj,]^v,vf.  (A.IO) 

1=1 

Then  T^V  >  0,      W_  >  0,      W  =  W+-W_, 

tT{W)  =  J2UJ,  =  tT{W+)  -  iT{W-)  (A. 11) 

and 

X:kl  =  tr(Ty+)  +  tr(W^_).  (A.12) 

1=1 

Now  consider  the  set 

H„,.  ^{WeSn:-I<W<I,      tr(iy+)  +  tr(Vr_)  =  k  }.  (A.13) 

Note  that  from  equations  (A.l)  and  (A.12) 

En,.  =  {WeS^:W  =  xdiag{w)x'^,    xeOn.n,    rveu^). 

For  any  W  G  E„,«,  setting  U  =  W+,  V  =  W.  shows  that  W  6  *„,«  and  so 
— n,K  ^  ^n,«-  The  same  example  given  earlier  to  show  that  (^„,«  does  not  equal  V'n,K 
also  shows  that  !E„,k  does  not  equal  '!'„,«.  Instead,  the  following  holds. 
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Corollary  A. 4 

*„.«  =  {W  eSn:-I<W  <  /,      tr{W+)  +  tr(iy_)  <k}.  (A.14) 


Proof:    Let   W   G    *„,«■     Since  W   =   U  -  V  with  0    <    t/    <    /,   0    <    V    <    /, 

tr(J7)  +  tr(F)  =  1,  equations  (A. 6)  and  (A. 8)  show  that  ^„,k  is  contained  in  the 
set  on  the  right-hand  side.  To  establish  the  reverse  inclusion  note  that  any  W  in 
the  set  on  the  right  side  has  eigenvalues  u'  =  [wi, . . .  ,a;„]-^  satisfying  the  conditions 
on  the  right  side  of  (A. 2).  Therefore,  from  Lemma  A.l,  there  exist  u,v  such  that 
w  =  u  —  V  ^  V'n,«-    Taking  U  =  Xdiag(u)X^  and  V  =  Xdiag(i;)X'^  shows  that 

W^u-v  e<ifn,..  ■ 

Remark:  The  lack  of  convexity  of  the  sets  ^„,k  and  E„,k  limits  their  use.  Hence 
the  preference  for  exploiting  t/'n.K  and  ^n,K,  which  are  convex  since  the  restricting 
equations  are  linear,  involving  sums  rather  than  sums  of  absolute  values. 

Characterization  of  the  extreme  points  of  ^n,K  is  now  possible. 

Lemma  A. 5    The  set 

{  YY^  -  ZZ^  :  Y^Z  =  0,      Y^Y  =  /«,,      Z'^ Z  =  /«,,   and  k,  +  K2  =  k  }. 

is  the  set  of  extreme  points  of '^n.K- 

Proof:  Combining  Lemma  A.l  and  Corollary  A. 4  gives 

*n,.  =  {WeSn:W  =  A'diag(u;)X^,      X  G  0„,„,      w  G  ^Pn,.  }■ 

The  extreme  points  of  ^„,«  therefore  have  the  form  A'diag(u;)A''^  where  w  is  an 
extreme  point  of  V'n.K-  Corollary    A. 2  then  gives  the  desired  result.  I 

An  obvious  generalization  of  Fan's  theorem  (which  we  have  nonetheless  not 
encountered  in  the  literature)  now  follows  as  a  corollary: 

Theorem  A. 6 


g.{A)=    max     {tr(y'My')  -  tr(ZMZ) 

'/CD  iJ       ^    —    ^K2 


Y'^Z  -  0,      Y'^Y  =  /«,,      Z^Z  =  /«„    and  ki  +  K2  =  k  }• 


Lemma  A. 5  also  leads  immediately  to  another  characterization  of  the  elements 
achieving  the  maximum  in  (4.21),  namely 
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Theorem  A. 7 


1  ^n 


argmax  {{A,W)  -.W  e  *„,«}     =     conv  {  Y'Y^  -  ZZ^  :  Y^  Z  =  0 

Ki  +  K2  =  K  and 

Y^AY  -  Z'^AZ  =  g,{A)  }    (A.15) 

From  the  ordering  (4.8)  of  the  eigenvalues  of  A  the  requirement  Y^ AY  —  Z^AZ  = 
5k(A)  means  that  the  columns  of  Y  must  include  r^  orthonormal  eigenvectors  for 
cdl  the  eigenvalues  Ai, . . .  ,  A^  and  the  columns  of  Z  must  include  r2  orthonormal 
eigenvectors  for  all  the  eigenvalues  A„_r2+i, . . . ,  A„,  so  Kj  >  rj  and  K2  >  r2.  The 
remaining  columns  of  Y  and  Z  may  be  any  orthonormal  sets  of  eigenvectors  corre- 
sponding to  Ari+i  =  . . .  =  Ar,+(i  and  Xn-r^-ti+i  =  •  •  •  =  A„_r2  respectively. 
This  leads  to  the  following  characterization  of  the  subdifferential  of  ^«(A). 

Corollary  A. 8 

dg^iA)    =    conv  {  YY^  -  ZZ'^  : 

the  columns  of  Y  form  an  o.n.  set  of  kj  eigenvectors  for  Ai, . . .  ,  A^, 
the  columns  of  Z  form  an  o.n.  set  of  K2  eigenvectors  for  A„_«2+i, . . . ,  A 
where  rj  <  kj  <  ri  +  ii,      r2  <  K2  ^  ^2  +  ^2,    and  Ki  +  K2  =  k  } 

The  presence  of  the  convex  hull  operation  means  that  this  form  is  not  as  computa- 
tionally convenient  as  that  given  in  Corollary  4.6,  nor  does  it  display  the  structure 
of  the  subdifferential  revealed  there. 

REFERENCES 

1.  F.  Alizadeh  (1991),  Private  communication. 

2.  V.  I.  Arnold  (1971),  "On  matrices  depending  on  parameters",  Russian  Math- 
ematical Surveys  26,2,  pp.  29-43. 

3.  R.  Bellman  (1970),  Introduction  to  Matrix  Analysis,  McGraw-Hill,  New  York, 
2nd  edition. 

4.  R.  Bellman  and  K.  Fan  (1963),  "On  systems  of  hnear  inequahties  in  matrix 
variables",  in:  V.  L.  Klee  ed..  Convexity,  American  Mathematical  Society, 
Providence,  R.I.,  pp.  1-11. 

5.  F.  Clarke  (1983),  Optimization  and  Nonsmooth  Amdysis,  John  Wiley,  New 
York.  Reprinted  by  SIAM,  Philadelphia,  1990. 

40 


6.  B.  D.  Craven  and  B.  Mond  (1981),  "Linear  programming  with  matrix  vari- 
ables", Linear  Algebra  and  its  Applications  38,  pp.  73-80. 

7.  J.  E.  Culhmi,  W.  E.  Donath  and  P.  Wolfe  (1975),  "The  minimization  of  certain 
nondifferentiable  sums  of  eigenvalues  of  symmetric  matrices".  Mathematical 
Programming  Study  3,  pp.  35-55. 

8.  K.  Fan  (1949),  "On  a  theorem  of  Weyl  concerning  the  eigenvalues  of  linear 
transformations".  Proceedings  of  the  National  Academy  of  the  Sciences  of 
U.S. A  35,  pp.  652-655. 

9.  P.  A.  Fillmore  and  J.  P.  Williams  (1971),  "Some  convexity  theorems  for  ma- 
trices", Glasgow  Mathematical  Journal  12,  pp.  110-117. 

10.  R.  Fletcher  (1985),  "Semi-definite  matrix  constraints  in  optimization",  SIAM 
Journed  on  Control  and  Optimization  23,  pp.  493-513. 

11.  S.  Friedland  (1981),  "Convex  spectral  functions",  Linear  and  Multilinear  Al- 
gebra 9,  pp.  299-316. 

12.  S.  Friedland,  J.  Nocedal  and  M.  L.  Overton  (1987),  "The  formulation  and 
analysis  of  numerical  methods  for  inverse  eigenvalue  problems" ,  SIAM  Journal 
on  Numerical  Analysis  24,  pp.  634-667. 

13.  G.  H.  Golub  and  C.  van  Loan  (1983),  Matrix  Computations,  John  Hopkins 
University  Press,  Baltimore. 

14.  R.  A.  Horn  and  C.  Johnson  (1985),  Matrix  Analysis,  Cambridge  University 
Press,  New  York. 

15.  T.  Kato  (1982),  A  Short  Introduction  to  Perturbation  Theory  for  Linear 
Operators,  Springer- Verlag,  New  York. 

16.  M.  L.  Overton  (1988),  "On  minimizing  the  maximum  eigenvalue  of  a  sym- 
metric matrix",  SIAM  Journal  on  Matrix  Analysis  and  Application  9,  pp. 
256-268. 

17.  M.  L.  Overton  (1990),  "Large-scale  optimization  of  eigenvalues" ,  NYU  Com- 
puter Science  Dept  Report  505.  To  appear  in  SIAM  Journal  on  Optimization. 

18.  M.  L.  Overton  and  R.  S.  Womersley  (1988),  "On  minimizing  the  spectral 
radius  of  a  nonsymmetric  matrix  function:  Optimality  conditions  and  duality 
theory",  SIAM  Journal  on  Matrix  Analysis  and  Applications  9,  pp.  473-498. 

19.  M.  L.  Overton  and  R.  S.  Womersley  (to  appear),  "Second  derivatives  for 
optimizing  eigenvalues  of  symmetric  matrices",  in  preparation. 

41 


20.  L.  Qi  and  R.  S.  Womersley  (to  appear)  "On  extreme  singular  values",  in 
preparation. 

21.  F.  Rendl  and  H.  Wolkowicz  (1990),  "A  projection  technique  for  partitioning 
the  nodes  of  a  graph" ,  Department  of  Combinatorics  and  Optimization  Report 
90-20,  University  of  Waterloo. 

22.  R.  T.  Rockafellar  (1970),  Convex  Analysis,  Princeton  University  Press,  Prince- 
ton, NJ. 

23.  A.  H.  Sameh  and  J.  A.  Wisniewski  (1982),  "A  trace  minimization  algorithm 
for  the  generalized  eigenvalue  problem" ,  SIAM  Journal  on  iVumericai  Analysis 
19,  pp.  1243-1259. 

24.  A.  Shapiro  (1985),  "Extremal  problems  on  the  set  of  nonnegative  definite 
matrices",  Linear  Algebra  and  its  Applications  67  pp.  7-18. 

25.  A.  Shapiro  and  J.  D.  Botha  (1988),  "Dual  algorithms  for  orthogonal  pro- 
crustes  rotations",  SIAM  Journal  on  Matrix  Analysis  9,  pp.  378-383. 

26.  G.  A.  Watson  (1990),  Private  communication. 

27.  H.  Wielandt  (1955),  "An  extremum  property  of  sums  of  eigenvalues",  Pro- 
ceedings of  the  American  Mathematical  Society  6,  pp.  106-110. 


42 


NYU   COMPSCI    TR-566        C.\ 
Overton,    Michael   L 
Optiipality   conditions   and 
dui  ■  -'-•'V   for 


DATE  DUE 


cayloud 

PRINTCD  INU.S.A. 

